Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

XML(::Simple) Parsing Efficiency

by billyak (Friar)
on Jun 28, 2002 at 16:12 UTC ( [id://178052]=perlquestion: print w/replies, xml ) Need Help??

billyak has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a project where I'd like to work with some simple (less than 2KB) XML files. I've been happily chugging along with XML::Simple in previous projects but this one requires that total run time is as low as possible.

Just for some testing purposes, I used the XML output of a Shoutcast server. It is about 5KB, more than twice the size of my files. I measured times with Time::HiRes. WinXP, AMD XP 1700, 512 PC2100, it took .16 seconds to parse the file with XML::Simple. I was shocked. I was expecting maybe two or three hundredths, but not two tenths. Chopping out some of the contents of the XML file, I cut the size in half. Same thing, .16 seconds.

At this point I'm freaking out, thinking that the Shoutcast XML output must be malformed and giving XML::Simple a hard time. I caved in and used the example straight out of the XML::Simple POD. 0.16 seconds. That really isn't acceptible for this application. I've tried messing with XML::Parser but ::Simple's limitations (except execution time) are absolutely no issue to me.

If I use XML::Parser, removing the ::Simple overhead, will I notice a speed improvement? Is there an XML module popularly used specifically for its speed?
I've looked a bit at XML::Parser and it seems 95% overkill for my needs, but if figuring out how to adapt it will save me time, I'll be going with that option.

While I'm here .. ;) .. is there any benefit of keeping the XML chunks in a mySQL table? My thinking is that over time, as the number of XML files increase, the seek time for that table will increase. Keeping individual files will put the seeking on a lower level and keep it quick.

Thanks,
-billyak

Replies are listed 'Best First'.
Re: XML(::Simple) Parsing Efficiency
by mirod (Canon) on Jun 28, 2002 at 16:57 UTC

    The latest version of XML::Simple, 1.08_01, which is not yet stable but should be soon I hope, can use any SAX parser. So if you install it, XML::SAX and XML::libXML and set XML::SAX to use that parser you should get an increase in performance.

Re: XML(::Simple) Parsing Efficiency
by flounder99 (Friar) on Jun 28, 2002 at 16:36 UTC
    Something sounds fishy (no pun intended) since your execution time does not change. Here are some suggestions:

    1) Try reparsing the same data $x times in a loop and see if you get $x * .16 the execution time.

    2) Show us your code - there may be something else happening.

    3) Try using Benchmark

    -UPDATE-

    Looks like it is definitely start-up time. (using XML::Simple's pod example)

    use Benchmark; use XML::Simple; use Time::HiRes qw( gettimeofday tv_interval ); my $t0 = [gettimeofday]; my $config = XMLin(); my $elapsed = tv_interval ( $t0 ); print "First Parsing took $elapsed seconds\n"; $config = undef; $t0 = [gettimeofday]; $config = XMLin(); $elapsed = tv_interval ( $t0 ); print "Second Parsing took $elapsed seconds\n"; timethis (1000, '$config = undef; $config = XMLin()'); __OUTPUT__ First Parsing took 0.312 seconds Second Parsing took 0.016 seconds timethis 1000: 6 wallclock secs ( 5.13 usr + 0.53 sys = 5.66 CPU) @ + 176.80/s (n=1000)

    --

    flounder

      Interesting results. This speedup may be more due to decent cacheing than initialization. What happens if you parse different XML content each iteration?

      -Mark

Re: XML(::Simple) Parsing Efficiency
by kvale (Monsignor) on Jun 28, 2002 at 16:37 UTC
    The time may not be so out of line. From the XML::Simple pod example, I wrote
    use XML::Simple; use Data::Dumper; use Time::HiRes qw( gettimeofday tv_interval ); my $t0 = [gettimeofday]; my $config = XMLin(); my $elapsed = tv_interval ( $t0 ); print "Parsing took $elapsed seconds\n"; print Dumper($config);
    which resulted in
    Parsing took 0.567832 seconds $VAR1 = { 'debugfile' => '/tmp/foo.debug', 'logdir' => '/var/log/foo/', 'server' => { 'sahara' => { 'address' => [ '10.0.0.101', '10.0.1.101' ], 'osversion' => '2.6', 'osname' => 'solaris' }, 'gobi' => { 'address' => '10.0.0.102', 'osversion' => '6.5', 'osname' => 'irix' }, 'kalahari' => { 'address' => [ '10.0.0.103', '10.0.1.103' ], 'osversion' => '2.0.34', 'osname' => 'linux' } } };
    on my old DEC, er, Compaq, er, HP Alpha.

    XML::Simple is layered atop XML::Parser so I'd expect XML::Parser to be a little quicker, but not much.

    -Mark

Re: XML(::Simple) Parsing Efficiency
by Aristotle (Chancellor) on Jun 28, 2002 at 16:59 UTC

    I'm not entirely sure here, but it sounds like what you're measuring is actually load-up time, while the parsing itself takes next to no time.

    As far as seek time is concerned, the difference is neglible for few files but is going to grow in favour of the database as the number of your files grows beyond several thousand - unless you're using something like ReiserFS for the filesystem. The problem is that typical filesystems scan directories linearly to resolve a filename to the corresponding inode number. Other than that, no method wins very much over the other - you have to pick the files from somewhere off the disk and that means seeking.

    Makeshifts last the longest.

storing XML in files vs table
by cebrown (Pilgrim) on Jun 28, 2002 at 17:56 UTC
    Another consideration for where to store the XML (file vs table) is how you will be retrieving the data. Not knowing the specifics of your application, I can only say generically that if you can provide an index to your data, a database will scale much, much better than files.

    Final idea -- if you are very worried about execution time of the XML parser, perhaps you could store the result (in a Data::Dumper style hash, say) of an XML parse, rather than the raw XML itself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://178052]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-26 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found