Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Parsing a highly nested XML file correctly and efficiently

by tangent (Parson)
on Jun 09, 2016 at 19:08 UTC ( [id://1165250]=note: print w/replies, xml ) Need Help??


in reply to Parsing a highly nested XML file correctly and efficiently

Here is a way to do it using XML::LibXML. Using Xpath expressions you should be able to match your nesting exactly.
my $doc = XML::LibXML->load_xml(string => $xml); my @nodes = $doc->findnodes('DatatoParse/elt'); for my $node ( @nodes ) { my $d1 = $node->findvalue('d1'); my $d2 = $node->findvalue('d2'); my @xnodes = $node->findnodes( 'Nest1/elt/Nest2/elt' ); for my $xnode ( @xnodes ) { my $d5x = $xnode->findvalue( 'd5/X' ); my $d5y = $xnode->findvalue( 'd5/Y' ); my $d6x = $xnode->findvalue( 'd6/X' ); my $d6y = $xnode->findvalue( 'd6/Y' ); my $d10 = $xnode->findvalue( 'Nest3/Nest4/d7/d9/d10' ); $d10 =~ s/^\s+//; $d10 =~ s/\s+$//; print "$d1,$d2,$d5x,$d5y,$d6x,$d6y,$d10\n"; } }
Output:
TV show 1,Heroes,-2,-3,5,8,yipppeee TV show 1,Heroes,-2,-3,5,8,yipppeee TV show 1,Heroes,-2,-3,5,8,yipppeee TV show 1,Heroes,-2,-3,5,8,yipppeee TV show 2,Prison Break,-2,-3,5,8,yipppeee TV show 2,Prison Break,-2,-3,5,8,yipppeee TV show 2,Prison Break,-2,-3,5,8,yipppeee TV show 2,Prison Break,-2,-3,5,8,yipppeee TV show 4,Alias,-2,-3,5,8,yipppeee TV show 4,Alias,-2,-3,5,8,yipppeee TV show 4,Alias,-2,-3,5,8,yipppeee TV show 4,Alias,-2,-3,5,8,yipppeee

Replies are listed 'Best First'.
Re^2: Parsing a highly nested XML file correctly and efficiently
by Ppeoc (Beadle) on Jun 17, 2016 at 00:27 UTC
    Thank you for the genius solution. This is so simple and yet efficient. Exactly what I was looking for

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1165250]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (1)
As of 2024-04-18 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found