Reinventing the wheel: Dumper Difficulties

Recently I have been routinely committing that most grave crime of reinventing wheels. The wheels are Data::Dumper and Data::Dump.

I can hear the hiss of shock. Dumper!? Why Dumper? Its been a standard module for ever. It was written by G. Sarathy. It's tried and true. (Ditto goes for G. Aas's Dump module)

Well. The first answer is that I dont like their output. Either of them. They do a depth first traversal of their data structures, which while fast and efficient renders cyclic and self referential data structures utterly unrecognizable, and usually completely incomprehensable. What I want (and have written) is a dumper that does a breadth first traversal so that if an object is referenced at multiple places in a data structure I want it to appear in the dump at the highest level it is mentioned, not wherever it gets first encountered in a depth first traversal.

But it turns out that there are even better reasons. Both are buggy.

Data::Dumper while a tried and true workhorse has a number of problems. Under MS it has a memory leak and can not handle large data structures. But even worse it has a very subtle bug with how it outputs references to scalars. This can be seen by the following very simple example

use Data::Dumper;
my ($x,$y);
$y='Foo';
$x=\$y;
print Dumper($x);
__END__
#Outputs
$VAR1=\'Foo';
[download]

Do you see the error? Its not obvious but its serious. The bug is that we can do this $$x='Bar'; but if we take the output of Dumper and try the same thing $$VAR1='Bar'; we get a Modification of a read-only value error.

So the output of Data::Dumper can not be relied on to correctly recreate its input.

And now for Data::Dump. Data::Dump suffers from the same problem with references to scalars as Dumper (hardly suprising as Data::Dump was originally by Sarathy). But it has even more serious problems

use Data::Dump;
my ($x,$y);
$x=\$y;
$y=\$x;
print dump([$x,$y]);
[download]

will cause Data::Dump to go into an infinite loop, which obviously means that we should be cautious to say the least when using Data::Dump for anything non-trivial.

Let this be a warning to those of you that use Data::Dumper or Data::Dump for persistancy purposes (with Storable or MLDBM for example)

So in the process of reinventing the wheel I discovered that the wheel isn't as good as I thought it was in the first place. I wonder how many other modules this applies to? CGI perhaps? Maybe we shouldnt be so harsh on those who think that a little reinventing the wheel isnt a bad thing. You never know we might end up with better wheels in the long run...

BTW, heres how my dumper (Data::BFDump) and Data::Dumper would handle that last example (its a test case that I call "Scalar Cross"), I know which one I would rather try to figure out, and it shows the difference between the results of a depth first traversal and a breadth first traversal.

#Dumper Output with Purity on
$VAR1 = [
          \\do{my $o},
          do{my $o}
        ];
${${$VAR1->[0]}} = $VAR1->[0];
$VAR1->[1] = ${$VAR1->[0]};

#Standard BFDump output
do {
    my $RT_ARRAY = [
                     \do { my $t },
                     \do { my $t }
                   ];
    ${$RT_ARRAY->[0]} = $RT_ARRAY->[1];
    ${$RT_ARRAY->[1]} = $RT_ARRAY->[0];
    $RT_ARRAY
}
[download]

Watch for the initial release of Data::BFDump on your local CPAN mirror over the next few days.

:-)

Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look.

Comment on Reinventing the wheel: Dumper Difficulties Select or Download Code

Replies are listed 'Best First'.
•Re: Reinventing the wheel: Dumper Difficulties by merlyn (Sage) on Apr 10, 2002 at 15:34 UTC
While you're at it, fix the bug I mention in my p5p bug report, which is a difficult one to fix. -- Randal L. Schwartz, Perl hacker	[reply]
Re: •Re: Reinventing the wheel: Dumper Difficulties by demerphq (Chancellor) on Apr 10, 2002 at 15:58 UTC
Well, I dont support Dumper style variable naming, I do a Dump style do{} output instead. But heres your two cases. From what I can tell they come out correctly. Let me know if they are wrong. UPDATE Doh. I shouldhave read your bugreport more carefully. Obviously both of these are incorrect in the respect that you mention. But i'm pretty sure that I can resolve that. Ill let you know. Oh and thanks, I've been meaning to get the Dumper test cases and run them through BFDump, but keep forgetting. So now that I've been reminded the download is running... ;-) use Data::BFDump qw(BFDump); my @dogs = ( 'Fido', 'Wags' ); my %kennel = ( First => \$dogs[0], Second => \$dogs[1], ); $dogs[2] = \%kennel; my $mutts = \%kennel; print "BFDump(\\\@dogs, \\\%kennel, \$mutts);\n"; print BFDump(\@dogs, \%kennel, $mutts),"\n\n"; print "BFDump(\\\%kennel, \\\@dogs, \$mutts);\n"; print BFDump(\%kennel, \@dogs, $mutts),"\n"; __END__ BFDump(\@dogs, \%kennel, $mutts); do { my $RT_ARRAY = [ 'Fido', 'Wags', {} ]; my $RT_HASH = { First => \do { my $v = 'Fido' }, Second => \do { my $v = 'Wags' } }; $RT_ARRAY->[2] = $RT_HASH; ( $RT_ARRAY, $RT_HASH, $RT_HASH ) } BFDump(\%kennel, \@dogs, $mutts); do { my $RT_HASH = { First => \do { my $v = 'Fido' }, Second => \do { my $v = 'Wags' } }; my $RT_ARRAY = [ 'Fido', 'Wags', $RT_HASH ]; ( $RT_HASH, $RT_ARRAY, $RT_HASH ) } [download] Oh and if anyone is wondering what the RT means in the variable names, it stands for root. Yves / DeMerphq --- Writing a good benchmark isnt as easy as it might look.	[reply] [d/l]
Re: Reinventing the wheel: Dumper Difficulties by clintp (Curate) on Apr 11, 2002 at 02:56 UTC
7 years in the auto industry has taught me that to re-invent a wheel is not enough. To be widely adopted that wheel has to fit existing axles, fenders, and suspensions as well as be familiar enough to mechanics -- with their tools -- who are ultimately the best salespeople of aftermarket equipment. Do you have switch and output compatability modes to plug into code that already uses DD or DD? hint hint PS: Oh and if anyone is wondering what the RT means in the variable names, it stands for root. Then waste the two bytes and call it root. :)	[reply]
Re: Re: Reinventing the wheel: Dumper Difficulties by demerphq (Chancellor) on Apr 11, 2002 at 08:54 UTC
Do you have switch and output compatability modes to plug into code that already uses DD or DD? hint hint Well, as you can see from my current example I support the Data::Dump style interface and not the Data::Dumper interface. However Im sure I can include a wrapper that emulates Data::Dumper functionality. Although I have to say that my intention is not so much to create a drop in replacement for either (although it should be up to job) but rather a development tool for trying to visualize and analyze data structures. Part of the reason that I take this perspective is that many trade-off decisions have been made in favor of analytical and presentational flexibility and utility and not to speed or memory overhead type concerns. So if you have to serialize a few million data structures then Data::BFDump is probably not the place to go (unless of course you are dumping structures that would be affected by the bugs I mentioned earlier.) OTOH if you are trying to figure out what data structure is being used by a new module, or why you keep getting weird results with that funky data structure you are developing then my tool will probably be exactly what you want to use. Then waste the two bytes and call it root. :) Honest I tried it. Trouble is that there are two prefixes used "BF" and "RT", and then the vars can be numbered as well so if I use ROOT explicitly in some situations the variable names get quite long indeed... Maybe Ill make it an option though. Compromise eh? :-) Yves / DeMerphq --- Writing a good benchmark isnt as easy as it might look.	[reply]
(MeowChow) Re: Reinventing the wheel: Dumper Difficulties by MeowChow (Vicar) on Apr 11, 2002 at 04:11 UTC
Have you taken a look at Data::Denter? Although the output format is not evalable (it has to be "undented"), I believe it handles circular references. I'm not sure if it deals with all the cases mentioned here, but it would be a good idea to make sure you're not reinventing a reinvented wheel :-) MeowChow s aamecha.s a..a\u$&owag.print	[reply]
Re: (MeowChow) Re: Reinventing the wheel: Dumper Difficulties by demerphq (Chancellor) on Apr 11, 2002 at 08:44 UTC
Well, Im familiar enough with Data::Denter to know that it doesnt meat at least one my needs. One key issue for me is not that the dumper chosen can "handle" cyclic structures, but that it deals with them in a reasonable way. Consider a situation where we have an array of objects, perhaps representing people. Now each object contains an array of references to some or all of the other objects, perhaps representing some kind of relationship between them. Now if I use depth first dumper to dump this array of objects, the dumper will dump the first object and then follow the references dumping each object (which are not children!) as if they were children of the first object. Almost inevitably this will result in a large chunk of the top level array containing references into the tree of parent and children objects. This is not what I want. What I would rather see is an array containing objects which contain an array with references to the other objects. The fact that an object is in the top level array means that it should be first mentioned in the context of this array, not as a child of another object. Some other functionality that ive built in (and wanted)includes "slicing" out a many mentioned object and declaring it first so that instead of having many "fix" statements at the end we can simply have this object mentioned multiple times. In fact ive got this implemented in such a way that the user can specify how many times an object needs to be referenced before it gets split out. In fact is precisely the "splitting" function that made me fail merlyns test case. I need to resolve this in the next few days, but once I do I will be able to handle both of merlyns tests. Oh and did I mention that I convert code refs to code when I dump? Yves / DeMerphq --- Writing a good benchmark isnt as easy as it might look.	[reply]
Re: Re: (MeowChow) Re: Reinventing the wheel: Dumper Difficulties by jmcnamara (Monsignor) on Apr 11, 2002 at 08:52 UTC
Oh and did I mention that I convert code refs to code when I dump? I sometimes read the newspaper. `;-)` -- John.	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks