http://qs321.pair.com?node_id=209044

Tanalis has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I'm in the middle of writing a script to handle a large volume of data, and populate a series of database tables with subsets of that data.

I've been using Data::Dumper to output the data at periods such that I could validate it's consistancy with what I expect to have been pulled in.

Now, I'm reading some 16,000 data items, and hashing them on a variety of keys such that I end up with a 3-level-deep hash of hashes: %hash -> $key1 -> $key2 -> $key3 -> values.
The final data values are largely scalars, except one, which is an array of 50 data values for the record for a series of potential situations that we have to store data on.

I have attempted to output, using print Dumper \%hash, both the entire hash, and subsets of the hash (even down to an individual record), but it appears that Data::Dumper simply can't cope with the structure of the data. The script locks, waiting for the Dumper to return, and basically just sits there, eating system resources until the box locks up.

Data::Dumper works fine if I prevent the final array of values from being added to the hash. I have gone ahead and added the array data to the database, and validated it that way, so I am 100% certain that the array is being assigned correctly to the hash ($hash{$key1}{$key2}{$key3} = \@array;).

Basically, I'm interested to know if anyone else has experienced a simliar issue with Data::Dumper and very deep HOHs and HOHOA. If anyone's aware of a workaround, that'd be very useful too - it's vital I have some convenient method of data validation.

I'd like to post some code regarding the population of the hash, but it's huge (750 lines+) and I don't see the necessity. I'm certain the hash is populated correctly from the database; I'd just welcome a way of validating that data before it gets to the DB, which is what Data::Dumper was previously being used for.

Cheers ..
-- Foxcub

Replies are listed 'Best First'.
Re: Data::Dumper Limitations
by hacker (Priest) on Oct 30, 2002 at 13:35 UTC
    Have you considered looking into Data::BFDump from one of our own monks, demerphq here?

    In many cases, it solves some problems found in Data::Dumper. He explains one reason why it exists on his diary page (sorry demerphq for referencing you diary entry on it, but it was the only place I could find that shows some background on it).

Re: Data::Dumper Limitations
by fglock (Vicar) on Oct 30, 2002 at 13:36 UTC

    Data::Dumper tracks "seen" references, and this takes a lot of memory for big structures.

    Maybe this helps:

    $Data::Dumper::Maxdepth or $OBJ->Maxdepth([NEWVAL])

    Can be set to a positive integer that specifies the depth beyond which which we don't venture into a structure. Has no effect when "Data::Dumper::Purity" is set. (Useful in debugger when we often don't want to see more than enough). Default is 0, which means there is no maximum depth.

Re: Data::Dumper Limitations
by rdfield (Priest) on Oct 30, 2002 at 15:10 UTC
    I managed to break it with a HOAOHOAOHOA with a mere 100 entries...I watched the memory usage climb to 1.5GB before it gave up. It worked OK with 75 which was enough for me to verify the structure, so I didn't bother persuing a solution. For the record, Storable worked OK with the structure so persistence was achievable (into an Oracle LOB as it happens).

    rdfield

Re: Data::Dumper Limitations
by PodMaster (Abbot) on Oct 30, 2002 at 13:37 UTC
    Have you tried DumperX?

    `perldoc Data::Dumper'

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.

Re: Data::Dumper Limitations
by redemption (Sexton) on Oct 30, 2002 at 18:33 UTC
    Hmm this seems eerily identical to something I was doing a few months back. I'd used Data::Dumper too and yes it breaks when the hash gets too big and deep. I think it depends on the memory you have available.

    You can do it the hard way and write your own foreach loops to print what you want. That was what I did and believe me, it does actually help you understand you data structure oh so much better :D
Re: Data::Dumper Limitations
by Anonymous Monk on Oct 30, 2002 at 18:44 UTC
    I don't how XML::Simple would do if Data::Dumper croaks, but if it does work it should allow you to verify your HOHOA properly.

      XML::Simple also tracks 'seen' references so that it can avoid circular references. So if that's what's causing problems for Data::Dumper it will probably cause problems for XML::Simple.

      Perhaps YAML might be worth a look (although it will need to track circular references too).

Re: Data::Dumper Limitations
by Popcorn Dave (Abbot) on Oct 31, 2002 at 00:41 UTC
    I'm not sure if this would work for you but have you thought about the Perl Tk debugger? (ptkdb)

    I've used it to look at the values in my hash tables for single hashes and you can expand or contract those. Whether it will work on nested hashes, I can't say because I haven't tried.

    Hope that helps!

    There is no emoticon for what I'm feeling now.

Re: Data::Dumper Limitations
by seattlejohn (Deacon) on Oct 31, 2002 at 08:38 UTC
    At the very least, I've seen Data::Dumper get painfully slow with complex data structures. It might be easier and faster to simply roll your own. Something like this would probably get you 90% of the way there:
    foreach my $key1 (keys %hash) { print "$key1:\n"; foreach my $key2 (keys %{$hash{$key1}}) { print " $key2:\n"; foreach my $key3 (keys %{$hash{$key1}{$key2}}) { print " $key3:\n " , join (",",@{$hash{$key1}{$key2}{$key3}}), "\n"; } } }

            $perlmonks{seattlejohn} = 'John Clyman';

(Solution) Re: Data::Dumper Limitations
by jplindstrom (Monsignor) on Oct 31, 2002 at 23:47 UTC
    I have run into this problem. When the data structure went above a certain size, the memory consumption went through the roof.

    The solution/workaround is this:

    local $Data::Dumper::Useqq = 1;

    This will (among other things, perldoc it) make it use the pure Perl implementation which doesn't seem to have this problem.

    Ok, it's slower for small data structures (so who cares?), but it works for large ones.


    /J

      My apologies. I wrote my reply without reading the others.

      Funny how we said almost exaclty the same thing. Only difference being that I dont recommend localizing the value. I would rather it works all the time than works fast most of the time. :-)

      --- demerphq
      my friends call me, usually because I'm late....

Re: Data::Dumper Limitations
by demerphq (Chancellor) on Nov 01, 2002 at 13:11 UTC
    If you are on winodows then this is a known bug in the XS implementation. The solution is simple.
    $Data::Dumper::Useqq=1;
    Which forces the use of the perl implementation which does not suffer the problem. For smaller dump sizes you will notice a performance penalty. For larger dump sizes you will notice that the dump actually finishes and does not run out fo memory. ;-)

    --- demerphq
    my friends call me, usually because I'm late....

Re: Data::Dumper Limitations
by jbeninger (Monk) on Oct 31, 2002 at 18:37 UTC
    I whipped up a module I called DataLocker before I was aware of Data::Dumper. I just put it up on my scratchpad as I don't really have anything else available. I think it will be able to handle your data structure without much trouble.
    DataLocker::store ($filename, $data_structure) $data_structure = DataLocker::retrieve ($filename);
    There may be a few problems though -
    1: It was designed for file storage, so you won't be able to simply print it to STDERR.
    2: It's not pretty. The output was meant to be machine-read, not human-read.

    Feel free to email me if you have any questions. Also, to all you monks out there, I apologize if this is a redundant, hacked-together version of something better which already exists, I'm still making my way up the learning curve when it comes to CPAN in all its glory.

Re: Data::Dumper Limitations
by wufnik (Friar) on Nov 01, 2002 at 14:32 UTC
    hola; grepped through *.pl for 'Data::Dumper', sure enough;
    "# use Data::Dumper; no! memory problems with large structs"
    typically experienced on my well-endowed PC.

    i remember it all now. my keenness to use serialization; falling for data::dumper; sleepless nights, the brief, intense romance; the awful realization - in the long term, my relation with freeze thaw was going to be a limited one, that i had better try Freeze::Thaw, or even Storable.

    in the end i turned to files, sockets, parsing; ashes to ashes, dumper to dust...

    rgds, w.

Re: Data::Dumper Limitations
by chicks (Scribe) on Nov 02, 2002 at 13:52 UTC
    I personally had similar Data::Dumper grief on a recent project. I switched to Storable and I haven't looked back since. It's very easy to use, it's fast, and it doesn't die under load the way Data::Dumper does. I still use Data::Dumper for dumping part of a data structure into a log file for debugging purposes, but for production use it's just too scary.
Re: Data::Dumper Limitations
by vitalipom (Initiate) on Feb 04, 2018 at 12:19 UTC
    Hi, Was looking at this forum since I also was noticing some limitation. Please try to use "use strict" in such cases. Here's an error message that I've got regarding something wrong that I did, which helped me to resolve the issue (along with some typos messages that I've got also): "Can't use string ("A") as a HASH ref while "strict refs" in use at test.pl line 10"..

    It tells me that I did something like this (which is wrong):

    8 my %simple_hash = ();

    9 $simple_hash{'a'}= "A";

    10 $simple_hash{'a'}{'b'}= "B";

    Thanks,

    Vitali.Pom