Data::Dumper Efficiency Problem

madhatter has asked for the wisdom of the Perl Monks concerning the following question:

I'm using variable persistence with Data::Dumper. Indentation is set to 0. What is up with this? Why does it's CPU/RAM intensity increase exponentially when the it parses two times the logs? I could understand a good several second increase, but this is rediculous. Is there something I'm missing?

Here is the run:

C:\Perl>perl stats.pl
0.0400      Data Loaded
0.4200      Lines Parsed: 10347
0.4210      Data Dumped
0.8810      TOTAL ; Finished Execution

C:\Perl>perl stats.pl
0.0400      Data Loaded
1.7620      Lines Parsed: 20337
50.5330     Data Dumped
52.3350     TOTAL ; Finished Execution
[download]

.. and the relevant code:

use Data::Dumper;
$Data::Dumper::Indent = 0;
$encoding = Data::Dumper->Dump(    [\%DATA    ],
                [qw(*DATA)]);
[download]

%DATA is a hash of hashes of hashes of strings and integers, basically. (heh)

If nothing can be done about this, what sort of (more efficient) alternatives are there? Preferably as simple as dumping to a file and a

do "data.dat";
[download]

Thanks,
madhatter

Comment on Data::Dumper Efficiency Problem Select or Download Code

Replies are listed 'Best First'.
(tye)Re: Data::Dumper Efficiency Problem by tye (Sage) on Jan 04, 2001 at 00:03 UTC
My partially wild guess is that Data::Dumper's stuffing everything into one big string causes lots of realloc()s which can't be done in-place due to Perl malloc()ing things in between so the growing string is repeatedly copied around to new places where there is enough space to hold it all in one piece. The correct solution is for Data::Dumper to be fixed to know how to write to a Perl file handle! - tye (but my friends call me "Tye")	[reply]
Re (tilly) 1: Data::Dumper Efficiency Problem by tilly (Archbishop) on Jan 04, 2001 at 00:08 UTC
You may want to look at Storable.	[reply]
Re: Data::Dumper Efficiency Problem by Trinary (Pilgrim) on Jan 04, 2001 at 00:12 UTC
I used to swear by Data::Dumper, but I don't anymore. I'm not sure about the internals, but I have to say that recently I've really come to be frustrated by it. Doing performance analysis under Win32 (Win32::PerfLib, not in CPAN), I end up dumping large hash structures all the time. When I was doing research into the format of these things I tried to get a dump of one base level object (System, for those who care). It ended up running out of memory, swap...everything. Wouldn't finish running, it was using well over 200M of memory. I have since then written my own (somewhat dumb) replacement, took an hour or two, and suggest either following suit or searching around here for something that has enough functionality for what you need and is simpler than Data::Dumper. If there's interest, I'll post my lil snippet, but it's basically trivial. Trinary	[reply]
Re: Re: Data::Dumper Efficiency Problem by madhatter (Sexton) on Jan 04, 2001 at 00:19 UTC
Trinary, Please do post! I'm very interested in this. Thanks, madhatter	[reply]
Re: Re: Re: Data::Dumper Efficiency Problem by Trinary (Pilgrim) on Jan 04, 2001 at 00:30 UTC
Ask, and ye shall recieve: This is just a sub, pretty basic actually and probably broken in a couple ways. takes a ref as argument, and starts-a-printin. Haven't done any performance testing vs. Data::Dumper. Begin code sub dumpref { my $testref = shift; my $levels = shift; if (ref($testref) eq 'HASH') { print "{\n"; $levels++; my $maxlevel = scalar(keys %$testref); my $curlevel = 0; foreach my $key (keys %$testref) { $curlevel++; print " " x $levels; print $key; print " => "; my $val = $testref->{$key}; if (ref($val)) { &dumpref($val,$levels); } else { $val =~ s#\\#\\\\#; $val =~ s#'#\\'#; print "'$val'"; } print "," if $curlevel < $maxlevel; print "\n"; } print " " x ($levels - 1) . "}"; } elsif (ref($testref) eq 'ARRAY') { print "[\n"; $levels++; my $maxlevel = scalar(@$testref); foreach my $val (@$testref) { $curlevel++; print " " x $levels; if (ref($val)) { &dumpref($val,$levels); print " " x ($levels - 1); } else { $val =~ s#\\#\\\\#; $val =~ s#'#\\'#; print "'$val'"; } print "," if $curlevel < $maxlevel; print "\n"; } print " " x ($levels - 1) . "]"; } else { print ref($testref); print "\n"; } } [download] End Code Use at your own risk, but it handles basic stuff ok, I think. =b Trinary	[reply] [d/l]
Re: Re: Re: Re: Data::Dumper Efficiency Problem by petral (Curate) on Jan 05, 2001 at 03:41 UTC
Re: Data::Dumper Efficiency Problem by repson (Chaplain) on Jan 04, 2001 at 06:34 UTC
Another method depending on data is XML::Simple. XMLout can take a filename or filehandle which may reduce memory used during running by immediate output instead of storing (I don't know if it does). XMLin is supposed to always create the original data structure... It does allow buzzword compliance, and a structure parseable without needing perl. As to your original question, Data::Dumper may be creating self referential output, this means that is has to remember and constantly process everything that has already passed through it. Read the module docs to find out if this may be happening and what you should do about it (call `$OBJ->Reset` under the OO interface possibly, depending on how you are doing things).	[reply] [d/l]
Re: Re: Data::Dumper Efficiency Problem by mirod (Canon) on Jan 04, 2001 at 06:49 UTC
This is not the way XML::Simple works. XML::Simple is designed to let you input an XML file (with some restrictions) and use the data it contains or update it and output it back. Altough I have never tried it I would bet it will not output arbitrary data structures as XML (although that might be fun!). On the other hand XML::Dumper and Data::DumpXML will dump data to XML. I have no idea how fast they are though (and considering Data::DumpXML is also written by Gisle AAs I don't think it will be faster than Data::Dumper).	[reply]
Re: Re: Re: Data::Dumper Efficiency Problem by repson (Chaplain) on Jan 04, 2001 at 07:00 UTC
Directly from the XML::Simple docs: `XMLout()` `Takes a data structure (generally a hashref) and returns an XML encoding of that structure. If the resulting XML is parsed using XMLin(), it will return a data structure equivalent to the original.` That sounds similar to what Data::Dumper is being used for here.	[reply]
Re: Re: Re: Re: Data::Dumper Efficiency Problem by mirod (Canon) on Jan 04, 2001 at 10:19 UTC
Re: Re: Re: Re: Re: Data::Dumper Efficiency Problem by repson (Chaplain) on Jan 05, 2001 at 05:03 UTC


Think about Loose Coupling
	PerlMonks