Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Bug when undefining a large hash

by oxone (Friar)
on Aug 22, 2008 at 13:09 UTC ( [id://706159]=perlquestion: print w/replies, xml ) Need Help??

oxone has asked for the wisdom of the Perl Monks concerning the following question:

It seems that it takes a long time to undefine a large hash on some operating systems. The following code illustrates the problem:

use strict; use warnings; # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; # Undefine it print "Undefining hash.\n"; undef %hash; print "Done.\n";

Now, on Windows this does what I'd expect: the time taken for the script to undefine the hash is minimal. However, on FreeBSD using Perl 5.8.8 the same script takes around 14 seconds to complete that 'undef'.

Searching on PM, there is some relevant discussion in this node, which notes "there is/was a malloc bug that led to slow destruction of big data structures".

My first question is: is this bug documented anywhere, in terms of which OS'es and Perls are affected, and exactly when the problem arises?

Second question: is this fixed in a version of Perl later than 5.8.8?

Final question: what are the possible workarounds?

From my own investigations, I can delay the problem to the end of the script by not undefining any large hashes, AND making sure all large hashes are globals (so they don't get undefined when they go out of scope at the end of a function block, for example). However on doing this (eg. by commenting out the 'undef' line in the script above), the problem still occurs in that there is a long delay after the script says "Done", but before it actually finishes.

I could combine this approach with the POSIX-based hack suggested here, thereby pushing the problem to the end of script execution, then skipping past Perl's own garbage collection with "POSIX::_exit(0);".

However, this all feels very hack-y, especially making all large hashes globals, which goes directly against the best practice of limiting scope of variables wherever appropriate.

Anybody know of any better workarounds?

Replies are listed 'Best First'.
Re: Bug when undefining a large hash
by moritz (Cardinal) on Aug 22, 2008 at 13:37 UTC
    I modified your script a little bit:
    use strict; use warnings; use Time::HiRes qw(time); # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; # Undefine it print "Undefining hash.\n"; my $before = time; undef %hash; printf "Done in %.03f seconds\n", time - $before;

    On my linux box it's about 1.5s for both perl 5.8.8 and perl 5.10.0.

    I suspect that might be due to a different malloc in windows and linux, so my next step will be do compile a perl with a different malloc, and try again.

    Update: A perl configured with -Uusemymalloc took 1.4s, so no real difference.

      Your times reflect my own for Windows, and your Linux test suggests that Linux does not show this problem either. This is perhaps a problem specifically with Perl on FreeBSD?
      my macbook pro (2.16Ghz Core2duo) took 3.17 seconds.. Darwin being a BSD derived OS.
Re: Bug when undefining a large hash
by FunkyMonk (Chancellor) on Aug 22, 2008 at 13:52 UTC
    14 seconds overall, or 14 seconds just for the undef. When I try this on freebsd I find that it takes twice as long to create the hash than to undef it:

    $ perl hash.pl ; perl hash.pl ; perl hash.pl Create: 5.26445007324219 Destroy: 1.82375288009644 Create: 5.18459510803223 Destroy: 1.82551097869873 Create: 5.14320707321167 Destroy: 1.8236780166626
    using a slightly different program:

    $ cat hash.pl #!/usr/bin/perl use strict; use warnings FATAL => 'all'; use Time::HiRes 'time'; my $t0 = time; my %hash; $hash{$_} = 1 for 1..3_000_000; my $t1 = time; undef %hash; my $t2 = time; print "Create: ", $t1-$t0, "\nDestroy: ", $t2-$t1, "\n";
    $ uname -a FreeBSD XXX.XXX.XXX 7.0-RELEASE-p2 FreeBSD 7.0-RELEASE-p2 #0: Wed Jun +18 06:48:16 UTC 2008 root@amd64-builder.daemonology.net:/usr/obj/ +usr/src/sys/GENERIC amd64
    $ perl -v This is perl, v5.8.8 built for amd64-freebsd

    And I get very similar results on linux with perl 5.10.0 and 5.8.8

      My 14 secs is just for the undef.

      I'm testing on FreeBSD 6.2-RELEASE-p1 with Perl "v5.8.8 built for i386-freebsd".

Re: Bug when undefining a large hash
by SuicideJunkie (Vicar) on Aug 22, 2008 at 13:32 UTC

    A less messy variant of that idea is to use just one global array called @garbage, and push references to your large hashes onto it.
    That way you would at least only have one global to worry about, and your structures would still stick around until the end.

    The response to the post you linked to sounds like it should solve your problem, provided that you have the ability to recompile Perl on the target systems: Re^3: Sort taking a long time to clean up

      Thanks: I like the @garbage idea which slightly reduces the hackiness of the workaround.

      (I'm afraid recompiling Perl isn't an option.)

Re: Bug when undefining a large hash
by dHarry (Abbot) on Aug 22, 2008 at 13:26 UTC

    Maybe you could try a two-step approach:

    %hash = (); undef %hash;
    to see if it improves the situation.

    update

    May you can also try delete. It is supposed to be slower but it would be interesting to compare. I have tried your example on Windows XP and Redhat (both Perl 5.8) and it runs fine. Strangely enough it takes a bit longer under Redhat?! Unfortunately I don't have FreeBSD available.

      Thanks for suggestions! Results:

      "%hash = ();" appears to do much the same as undef(), at least, it creates the same long delay.

      Iterating all the keys and deleting them is slower than undef, although not by much: around 16 secs as opposed to 14 secs for undef.

      So, no solutions there, but thanks for the ideas.

        Frankly speaking that didn't surprise me. I just thought it was worth a try. I tried it myself on Redhat also with increased values up to 30_000_000 same results as you.

Re: Bug when undefining a large hash
by aufflick (Deacon) on Aug 23, 2008 at 09:59 UTC

    FWIW here are my results on MacOSX 10.5 PPC using MacPort's perl 5.8.8:

    s/iter With undef Without undef With undef 22.5 -- -9% Without undef 20.6 10% --

    And with the system perl 5.8.8:

    s/iter With undef Without undef With undef 22.5 -- -8% Without undef 20.7 9% --

    So on this platform the undef time is certainly measurable, but exactly an order less than the creation time.

    Here is the code I used for the benchmark comparisons:

    use strict; use warnings; use Benchmark qw(:all); sub without_undef { # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; } sub with_undef { # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; # Undefine it undef %hash; } cmpthese(20, { 'Without undef' => \&without_undef, 'With undef' => \&with_undef, });
      Thanks for the code, I used it to benchmark Perl 5.10.0 on Linux:
      s/iter With undef Without undef With undef 17.2 -- -26% Without undef 12.8 34% --
      which, although a bigger difference than yours, isn't as bad as what oxone reported. This was on:
      Linux 2.6.25 #6 SMP Tue Aug 5 17:42:15 SGT 2008 i686 GNU/Linux This is perl, v5.10.0 built for i486-linux-gnu-thread-multi

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://706159]
Approved by jettero
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-04-25 12:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found