Fastest XML Parser for BIG files

Doctrin has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear Monks. Can anyone tell which Perl module is the fastest one to parse really big xml file (about 5 GB, 6 million nodes)? I mean node-by-node parsers, of course. I think XML::Twig would do the trick, but I'm not sure it is the fastest one... Thanks

Comment on Fastest XML Parser for BIG files

Replies are listed 'Best First'.
Re: Fastest XML Parser for BIG files by daxim (Curate) on Jul 21, 2013 at 13:34 UTC
XML::LibXML::Reader	[reply]
Re: Fastest XML Parser for BIG files by ambrus (Abbot) on Jul 21, 2013 at 20:07 UTC
Could you tell a bit more about your task besides the size of the file? Do you want to process all or most of the data in the xml file in some way, such as making aggregate statistics or converting to a different format? Or do you instead want to find only a few nodes that are easy to recognize in the XML without too much extra processing?	[reply]
Re^2: Fastest XML Parser for BIG files by Doctrin (Beadle) on Jul 24, 2013 at 17:03 UTC
I need to make a complex processing of each node, retreiving some sub-nodes' values and attributes and then making db queries with them.	[reply]
Re: Fastest XML Parser for BIG files by Discipulus (Canon) on Jul 22, 2013 at 07:50 UTC
I'm new in XML processing (really i'm new quite on everithing!) but I remember a 13 factor when the twig is created. I found also this speed comparison you could find interesting. The summaized results are:If you want high performance: XML::Parser If you want relatively easy, memory efficient parsing of huge files: XML::Twig If you want easy-to-implement for small files: XML::Simple If you want to have a bad deal: XML::Smart L* there are no rules, there are no thumbs..	[reply] [d/l]
Re: Fastest XML Parser for BIG files by Preceptor (Deacon) on Jul 21, 2013 at 19:10 UTC
Can't comment on speed, but I find XML::Twig's capability to do twig->purge to free memory as you go to be invaluable, once you start parsing large files - I seem to recall the rule of thumb is that you need to assume 10x memory overhead when XML parsing.	[reply]

Back to Seekers of Perl Wisdom