Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Issue with looping through XML::LibXML::Reader

by ozguy (Novice)
on Jan 20, 2012 at 20:47 UTC ( [id://949046]=perlquestion: print w/replies, xml ) Need Help??

ozguy has asked for the wisdom of the Perl Monks concerning the following question:

Hey guru's,

Long time Perl user, first time XML::LibXML::Reader user, and I've been banging my head against it for a number of days now.

The info on the module page isn't that great, but I've learned a lot through searching the monastery.

I have been able to get most of the data out of the following data, but am stuck on one field.

Here is an example of the XML data:

<FIXML r="20030618" s="20040109" v="4.4" xr="FIA" xv="1"> <Batch> <MktDataFull RptID="13793742" BizDt="2011-12-23"> <Instrmt Sym="MID" MMY="20120317"/> <Full Typ="5" Px="5.303128"/> <Full Typ="D" Px="884.91"/> </MktDataFull> <MktDataFull RptID="14536119" BizDt="2011-12-23"> <Instrmt Sym="MID" MMY="20120218"/> <Full Typ="5" Px="214.007661"/> <Full Typ="D" Px="884.91"/> </MktDataFull> </Batch> </FIXML>

I have been able to get all data, except for each RptID and BizDt (seems to only capture the first RptID records value, and not the subsequent ones) with the following code:

my $reader = new XML::LibXML::Reader(location => "$XMLfile") or die "c +annot read $XMLfile\n"; while ( $reader->nextElement( 'MktDataFull' )) { my $RptID = $reader->getAttribute('RptID'); my $BizDt = $reader->getAttribute('BizDt'); $reader->read; while ( $reader->nextElement( 'Instrmt' )) { my $Sym = $reader->getAttribute('Sym'); my $MMY = $reader->getAttribute('MMY'); $reader->read; while (1) { if ($reader->localName eq 'Full') { $Typ = <br>$reader->getAttribute('Typ'); } $reader->nextSibling() > 0 or last; } $fileLine = $RptID . "," . $BizDt . "," . $Sym . "," . $MMY . +"," . $MatDt . "," . $CFI_recType . "," . $Typ4Dt; print CSVOUT "$fileLine\n"; } $reader->nextSibling() > 0 or last; }

For some reason it's only reading the first "MktDataFull" records attributes (RptID & BizDt), and not any of the others (the are .5 million records per file).

What am I doing wrong? Any advice will be greatly appreciated.

On a side note, the file I'm reading has a half a million records in it, and is 150MB in size. Is there any other form of XML parser that would be easier and/or faster for this task?

Once again, any advice will be greatly appreciated guru's.

Replies are listed 'Best First'.
Re: Issue with looping through XML::LibXML::Reader
by jmcnamara (Monsignor) on Jan 21, 2012 at 02:34 UTC

    The problem seems to be that you are reading the next Instrmt element in a while loop and as a result you skip the parent MktDataFull elements.

    If you change the while() to an if() it should fix the main issue.

    ... if ( $reader->nextElement( 'Instrmt' ) ) { my $Sym = $reader->getAttribute( 'Sym' ); ...

    --
    John.

      As easy as that..... Thank you so much for the quick and helpful response John.

      I did have also comment out one of the last read requests ($reader->nextSibling() > 0 or last;) as well, and it now works as it should.

Re: Issue with looping through XML::LibXML::Reader
by runrig (Abbot) on Jan 20, 2012 at 21:37 UTC
    First if you're not already using it, I recommend Use strict and warnings, as your code currently has a few uninitialized variables of questionable origin. Next, that modules seems klunky for this purpose, I'd use XML::Rules (I don't print out the same thing as you, since I don't know where you want to get some of what you want):
    use strict; use warnings; use XML::Rules; my $xml = <<XML; <FIXML r="20030618" s="20040109" v="4.4" xr="FIA" xv="1"> <Batch> <MktDataFull RptID="13793742" BizDt="2011-12-23"> <Instrmt Sym="MID" MMY="20120317"/> <Full Typ="5" Px="5.303128"/> <Full Typ="D" Px="884.91"/> </MktDataFull> <MktDataFull RptID="14536119" BizDt="2011-12-23"> <Instrmt Sym="MID" MMY="20120218"/> <Full Typ="5" Px="214.007661"/> <Full Typ="D" Px="884.91"/> </MktDataFull> </Batch> </FIXML> XML my @rules = ( MktDataFull => sub { my $data = $_[1]; for my $full (@{$data->{Full}}) { print join(",", @$data{qw(RptID BizDt Sym MMY)}, @$full{qw(Typ P +x)}), "\n"; } return; }, Instrmt => 'pass', Full => 'as array', ); my $xr = XML::Rules->new(rules => \@rules, stripspaces => 3); $xr->parse($xml);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://949046]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-26 07:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found