Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

XML::Rules parsing inside out?

by bfdi533 (Friar)
on Dec 06, 2017 at 23:52 UTC ( [id://1205065]=perlquestion: print w/replies, xml ) Need Help??

bfdi533 has asked for the wisdom of the Perl Monks concerning the following question:

I have been scratching my head trying to figure out what is wrong with my parsing rules and I find put in some print statements in the rules and found that XML::Rules is parsing the XML "inside out". I am reading through the XML top to bottom to develop my rules and logic and find I am missing some info. Now I see why.

take the following example:

<?xml version="1.0"?><xmltest xmlns="http://localhost/nothing"> <summary> <item> <value>1.0</value> </item> </summary> <detail1> <item> <value>2.0</value> </item> </detail1> <detail2> <item> <value>3.0</value> </item> </detail2> </xmltest>
I have been attacking this (seemingly) incorrectly with rules like the following:
my @rules_lpo = ( default => sub { $_[0] => $_[1]->{_content}; print "$_[0] : $_[1]->{_content}\n"; }, 'summary' => sub { $insummary = 1; }, 'detail1' => sub { $ind1 = 1; $insummary = 0; }, 'detail2' => sub { $ind2 = 1; $ind1 = 0; }, 'item' => sub { if ($insummary) { $sumvalue = $_[1]->{value}; } if ($ind1) { $d1value = $_[1]->{value}; } if ($ind2) { $d2value = $_[1]->{value}; } } );

This does not work. value is processed before item which is processed before the outside tag so the flags get set AFTER the section has already processed the guts. I only saw this from the print statement in '_default' rule which I added out of sheer frustration.

The result is $sumvalue is equal to d1 value, $d1value = d2 value and $d2value is never set. I have been losing my mind over this.

How do I properly handle duplicate tags in different sections if I cannot flag them due to the way the XMl is processed in this inside out manner???

Update: Updated the XML to have proper wrapper.

Replies are listed 'Best First'.
Re: XML::Rules parsing inside out?
by runrig (Abbot) on Dec 07, 2017 at 22:22 UTC

    XML::Rules is infinitely better than XML::Simple. You don't seem to understand that a tag is not fully 'parsed' until the end of the tag is reached. If you want to catch the start of the tag (along with its attributes), then use a start tag rule (either prefix the tag with "^", or use the 'start_rules' argument).

    This is sensible because you cannot pass the contents of inner tags up to the outer tag rules (which is possible with XML::Rules) until the end tag of the outer XML tag is reached.

      Here's an (untested) example:
      use XML::Rules; use Data::Dumper; my %tag_value; my $tag_name; my $xr = XML::Rules->new( rules => [ value => sub { $tag_value{$tag_name} = $_[1]->{_content} +], start_rules => [ 'summary,detail1,detail2' => sub { $tag_name = $_[0 +] } ], ); $xr->parse($xml) # I left the xml out print Dumper \%tag_value;
      Note that this could be done much differently by passing the 'value' content up through the 'item' tag to the item's parent tag, returning the entire data structure. I'll leave that as an exercise...(Hint: use the 'content' rule for value and 'pass no content' for item tags, and 'no content by <attr>' for the other tags).

      Good explanation and I did some more digging and found that $_[2] is an array of the prior tags in the "stack".

      So, if I do the following I can determine which item I am in and handle appropriately:

      'item' => sub { my $in_item = $_[2][$#{$_[2]}]; if ($in_item eq "summary") { $hash{summary} = $_[1]->{value}; } if ($in_item eq "detail1") { $hash{detail1} = $_[1]->{value}; } if ($in_item eq "detail2") { $hash{detail2} = $_[1]->{value}; } }

      This gets me what I want perfectly! Obviously my actual XMl is much more complicated than this contrived example (and I actually have an array of items in detail2 (sometimes) so this works quite well to determine what 'value' belongs to.

      Thank you for the time to answer and steer me in my research to get where I need to go.

        Good explanation and I did some more digging ... This gets me what I want perfectly! Obviously my actual XMl is much more complicated than this contrived example

        Why XML::Rules?

        To me this is clearly XML::Twig or xpath territory

Re: XML::Rules parsing inside out?
by Anonymous Monk on Dec 06, 2017 at 23:59 UTC

      Here is the result from xml2XMLRules.pl on the (updated) XML above:

      [user@host cwd]> xml2XMLRules.pl xmltest.xml { 'value' => 'content', 'detail1,detail2,item,summary,xmltest' => 'no content' }

      Not much help as I already know 'value' is the only thing with content but I do not know what it belongs to if I were to use those rules ...

        Hi

        So, um, why not use those rules? Then work with the resulting hash?

        *tip* ?node_id=3989;BIT=XML%3A%3ARules-%3Enew;HIT=xml ... Re: XML::LibXML drives me to drinking

        This might be something like what you were attempting

        #!/usr/bin/perl -- use strict; use warnings; use XML::Rules; use Data::Dump qw/ dd /; my $rawxml = q{<?xml version="1.0" encoding="UTF-8"?> <root> <summary> <item> <value>1.0</value> </item> </summary> <detail1> <item> <value>2.0</value> </item> </detail1> <detail2> <item> <value>3.0</value> </item> </detail2> <value> 11 </value> </root> }; dd( XML::Rules->new( rules => [], )->parse( $rawxml ) ); dd( XML::Rules->inferRulesFromExample( $rawxml ) ); dd( XML::Rules->new( rules => XML::Rules->inferRulesFromExample( $rawx +ml ), )->parse( $rawxml ) ); my ( $summary, $detail1, $detail2 ) ; my $xr = XML::Rules->new( qw/ stripspaces 8 /, rules => { 'detail1,detail2,item,root,summary' => sub { return; }, 'value' => [ '/root/summary/item' => sub { ( $summary, $detail1, $detail2 ) = (); #reset $summary = $_[1]->{_content}; return; }, '/root/detail1/item' => sub { $detail1 = $_[1]->{_content}; return; }, '/root/detail2/item' => sub { $detail2 = $_[1]->{_content}; warn "$summary $detail1 $detail1\n"; return; }, sub { die "unexpected 'value' at ".join('/','',@{$_[2]}) } +, ], }, ); my $ret = $xr->parse( $rawxml ); dd( $ret ); __END__ $ perl xml-rules-1205065.pl { root => { _content => "\n\n\n\n\n", detail1 => { _content => "\n \n", item => { _content => "\n \n ", value => { _cont +ent => "2.0" } }, }, detail2 => { _content => "\n \n", item => { _content => "\n \n ", value => { _cont +ent => "3.0" } }, }, summary => { _content => "\n \n", item => { _content => "\n \n ", value => { _cont +ent => "1.0" } }, }, value => { _content => " 11 " }, }, } { "detail1,detail2,item,root,summary" => "no content", "value" => "content", } { root => { detail1 => { item => { value => "2.0" } }, detail2 => { item => { value => "3.0" } }, summary => { item => { value => "1.0" } }, value => " 11 ", }, } 1.0 2.0 2.0 unexpected 'value' at /root at xml-rules-1205065.pl line 54.
Re: XML::Rules parsing inside out?
by Jenda (Abbot) on Dec 08, 2017 at 12:03 UTC

    Yes, the normal rules work inside out. They are not meant to set global variables, they are supposed to extract, massage and return data. You should only change the global state once you are done with the branch. For example if the XML contains a list of articles, then the rules for the inner tags will massage the data into a convenient format and the rule for the articles will insert the accumulated data into the database.

    XML::Rules is not a normal event based parser, one that would fire events at you and expect you to take care of any bookkeeping.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      XML::Rules is not a normal event based parser, one that would fire events at you and expect you to take care of any bookkeeping.
      Depending on the XML and what I'm trying to do, sometimes I find it easier to use XML::Rules as a semi-event based parser by leveraging the start rules and setting global variables.

        Whatever works. :-)

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1205065]
Approved by thomas895
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-18 04:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found