Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Parsing a highly nested XML file correctly and efficiently -- XML::Twig

by Discipulus (Canon)
on Jun 10, 2016 at 09:07 UTC ( [id://1165274]=note: print w/replies, xml ) Need Help??


in reply to Parsing a highly nested XML file correctly and efficiently

Hello, so if I understood your desired output, you can simply get the first_children sequentially; like :

# into the sub 'twig_handler' '/DatatoParse/elt' $_[1]->first_child('Nest1')->first_child('elt')->first_child('Junk1 +')->text;
This becomes very prolix and repetitive soon, in fact you only need an xpath so for Junk1 you can also:
my @junk1 = $_[1]->get_xpath('./Nest1/elt/Junk1'); print $junk1[0]->text;
So having a lot of xpath to process the same way you can compatc the code a lot, ending with the following twig_handler
sub elt_map{ my $elt = $_[1]; print join ',', map { my @cur = $elt->get_xpath($_); $cur[0]->text; }(qw( d1 d2 ./Nest1/elt/Junk1 ./Nest1/elt/Junk2 ./Nest1/elt/Nest2/elt/d5/X ./Nest1/elt/Nest2/elt/d5/Y ./Nest1/elt/Nest2/elt/d6/X ./Nest1/elt/Nest2/elt/d6/Y ./Nest1/elt/Nest2/elt/Nest3/Nest4/d7/d9/d10/ +d11 )); print "\n" }

The whole code will be:

use strict; use warnings; use XML::Twig; my $field = "Nest1"; my $twig = XML::Twig->new( twig_handlers => {'/DatatoParse/elt' => \&el +t_map,} ); $/=''; $twig->parse(<DATA>); sub elt_map{ my $elt = $_[1]; print join ',', map { my @cur = $elt->get_xpath($_); $cur[0]->text; }(qw( d1 d2 ./Nest1/elt/Junk1 ./Nest1/elt/Junk2 ./Nest1/elt/Nest2/elt/d5/X ./Nest1/elt/Nest2/elt/d5/Y ./Nest1/elt/Nest2/elt/d6/X ./Nest1/elt/Nest2/elt/d6/Y ./Nest1/elt/Nest2/elt/Nest3/Nest4/d7/d9/d10/ +d11 )); print "\n" } __DATA__ <DatatoParse> <elt> <d1>TV show 1</d1> ....

with the following output

TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee TV show 2,Prison Break,FULL,Page 65,-2,-3,5,8,yipppeee TV show 4,Alias,FULL,Page 65,-2,-3,5,8,yipppeee

In addition, when you need to write everytimes to a destination file, you can profit of select $filehandle; Is very useful also because while debugging you can comment it to see at screen the output.

HtH

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1165274]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-19 02:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found