Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

XML::Twig problem

by BenHopkins (Sexton)
on May 31, 2007 at 00:11 UTC ( [id://618360]=perlquestion: print w/replies, xml ) Need Help??

BenHopkins has asked for the wisdom of the Perl Monks concerning the following question:

I have a little program based on the twig's doc "Building An XML Filter." It makes roots for the things I need to process, and uses twig_print_outside_roots. However, the output is NOT valid xml (the input is). Here's how the input starts:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE nitf SYSTEM "../CCI-DTD/nitf-3-1.dtd" [ <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for X +HTML//EN" "../CCI-DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for + XHTML//EN" "../CCI-DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special fo +r XHTML//EN" "../CCI-DTD/xhtml-special.ent"> %HTMLspecial; ]>
Here's the output:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE nitf SYSTEM "../CCI-DTD/nitf-3-1.dtd"> <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for X +HTML//EN" "../CCI-DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for + XHTML//EN" "../CCI-DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special fo +r XHTML//EN" "../CCI-DTD/xhtml-special.ent"> %HTMLspecial; >
The square brackets surrounding the three !ENTITY declarations are missing. Here's the program's new declaration:
my $t = XML::Twig->new( twig_roots => { "$nitf_root/body/body.head/hedline/hl1" => \&f +ix_hl1, "$nitf_root/body/body.head/hedline/hl2" => \&f +ix_hl2, }, twig_print_outside_roots => 1, keep_encoding => 1, );
At first I didn't have keep_encoding, and then besides the missing square brackets, the first !ENTITY was also missing. keep_encoding restored the first !ENTITY, but not the brackets.

Any ideas?

(I do a flush after the parse, so it's not that, althought I don't see how it could affect anything, I saw something about it.)

Replies are listed 'Best First'.
Re: XML::Twig problem
by mirod (Canon) on May 31, 2007 at 06:34 UTC

    The development version of XML::Twig (at http://xmltwig.com/xmltwig nearly fixes this: the brackets are there, but the first entity declaration is not output properly (it comes out as <!ENTITY HTMLlat1 SYSTEM "../CCI-DTD/xhtml-lat1.ent" PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" >). The other 2 are output properly, which is quite puzzling. This occurs whether keep_encoding is used or not. I'll fix it and report back.

    I have refactored the code that outputs the internal DTD in the new version, so tests are welcome. Apparently my test suite did not cover this case, so I will add a test too.

    Thanks for the info.

Re: XML::Twig problem
by mirod (Canon) on May 31, 2007 at 11:23 UTC

    OK, it looks like I did not process properly parameter entities.

    It's fixed in the development version, let me know if it works for you.

      It works. Thanks. But (you knew there would be a but, didn't you?), in the output, there was some trailing text after the final tag, which went away when I took out the flush() call.

      Also, when I tried to verify the soundness of the outputted XML with xml_pp (mine, not yours), it got this error:Undefined subroutine &Text::Wrap::wrap called at /usr/local/perl/5.8.2/lib/site_perl/5.8.2/XML/Twig.pm line 7476. When I replaced indented_c with nice, it worked.

        Indeed the flush messes things up when you're using twig_print_outside_roots. I should check for that, I'll see what I can do. In fact, with recent versions of the module, the flush after the end of the parsing is no longer needed. The module assumes that if you started flushing, then you want to keep on doing it (or you would most likely get non well-formed XML), so at the end of the parse, if flush has been used on the twig, it performs a last flush, using the filehandle that was used for the first flush. It DWIMs better that it reads ;--(.

        As for the xml_pp problem, I don't know, maybe you redefined the constants and you ended up using the one for 'wrapped' or 'cvs' instead of the one for 'indented_c' ? What's the value you are using for the style? BTW I usually use xmlwf, xmllint or perl -MXML::Parser -e'XML::Parser->new( ErrorContext => 1)->parsefile( shift())' file.xml to check the well-formedness of the XML.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://618360]
Approved by Moriarty
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 05:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found