Bug when undefining a large hash

oxone has asked for the wisdom of the Perl Monks concerning the following question:

It seems that it takes a long time to undefine a large hash on some operating systems. The following code illustrates the problem:

use strict;
use warnings;

# Create a large hash
my %hash;
my $count = 3_000_000;
$hash{$count} = rand(10) while $count--;

# Undefine it
print "Undefining hash.\n";
undef %hash;
print "Done.\n";
[download]

Now, on Windows this does what I'd expect: the time taken for the script to undefine the hash is minimal. However, on FreeBSD using Perl 5.8.8 the same script takes around 14 seconds to complete that 'undef'.

Searching on PM, there is some relevant discussion in this node, which notes "there is/was a malloc bug that led to slow destruction of big data structures".

My first question is: is this bug documented anywhere, in terms of which OS'es and Perls are affected, and exactly when the problem arises?

Second question: is this fixed in a version of Perl later than 5.8.8?

Final question: what are the possible workarounds?

From my own investigations, I can delay the problem to the end of the script by not undefining any large hashes, AND making sure all large hashes are globals (so they don't get undefined when they go out of scope at the end of a function block, for example). However on doing this (eg. by commenting out the 'undef' line in the script above), the problem still occurs in that there is a long delay after the script says "Done", but before it actually finishes.

I could combine this approach with the POSIX-based hack suggested here, thereby pushing the problem to the end of script execution, then skipping past Perl's own garbage collection with "POSIX::_exit(0);".

However, this all feels very hack-y, especially making all large hashes globals, which goes directly against the best practice of limiting scope of variables wherever appropriate.

Anybody know of any better workarounds?

Comment on Bug when undefining a large hash Download Code

Replies are listed 'Best First'.
Re: Bug when undefining a large hash by moritz (Cardinal) on Aug 22, 2008 at 13:37 UTC
I modified your script a little bit: `use strict; use warnings; use Time::HiRes qw(time); # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; # Undefine it print "Undefining hash.\n"; my $before = time; undef %hash; printf "Done in %.03f seconds\n", time - $before;` [download] On my linux box it's about 1.5s for both perl 5.8.8 and perl 5.10.0. I suspect that might be due to a different malloc in windows and linux, so my next step will be do compile a perl with a different malloc, and try again. Update: A perl configured with `-Uusemymalloc` took 1.4s, so no real difference.	[reply] [d/l] [select]
Re^2: Bug when undefining a large hash by oxone (Friar) on Aug 22, 2008 at 13:55 UTC
Your times reflect my own for Windows, and your Linux test suggests that Linux does not show this problem either. This is perhaps a problem specifically with Perl on FreeBSD?	[reply]
Re^2: Bug when undefining a large hash by dextius (Monk) on Aug 27, 2008 at 13:15 UTC
my macbook pro (2.16Ghz Core2duo) took 3.17 seconds.. Darwin being a BSD derived OS.	[reply]
Re: Bug when undefining a large hash by FunkyMonk (Chancellor) on Aug 22, 2008 at 13:52 UTC
14 seconds overall, or 14 seconds just for the undef. When I try this on freebsd I find that it takes twice as long to create the hash than to undef it: `$ perl hash.pl ; perl hash.pl ; perl hash.pl Create: 5.26445007324219 Destroy: 1.82375288009644 Create: 5.18459510803223 Destroy: 1.82551097869873 Create: 5.14320707321167 Destroy: 1.8236780166626` [download] using a slightly different program: `$ cat hash.pl #!/usr/bin/perl use strict; use warnings FATAL => 'all'; use Time::HiRes 'time'; my $t0 = time; my %hash; $hash{$_} = 1 for 1..3_000_000; my $t1 = time; undef %hash; my $t2 = time; print "Create: ", $t1-$t0, "\nDestroy: ", $t2-$t1, "\n";` [download] `$ uname -a FreeBSD XXX.XXX.XXX 7.0-RELEASE-p2 FreeBSD 7.0-RELEASE-p2 #0: Wed Jun +18 06:48:16 UTC 2008 root@amd64-builder.daemonology.net:/usr/obj/ +usr/src/sys/GENERIC amd64` [download] `$ perl -v This is perl, v5.8.8 built for amd64-freebsd` [download] And I get very similar results on linux with perl 5.10.0 and 5.8.8	[reply] [d/l] [select]
Re^2: Bug when undefining a large hash by oxone (Friar) on Aug 22, 2008 at 14:00 UTC
My 14 secs is just for the undef. I'm testing on FreeBSD 6.2-RELEASE-p1 with Perl "v5.8.8 built for i386-freebsd".	[reply]
Re: Bug when undefining a large hash by SuicideJunkie (Vicar) on Aug 22, 2008 at 13:32 UTC
A less messy variant of that idea is to use just one global array called @garbage, and push references to your large hashes onto it. That way you would at least only have one global to worry about, and your structures would still stick around until the end. The response to the post you linked to sounds like it should solve your problem, provided that you have the ability to recompile Perl on the target systems: Re^3: Sort taking a long time to clean up	[reply]
Re^2: Bug when undefining a large hash by oxone (Friar) on Aug 22, 2008 at 13:50 UTC
Thanks: I like the @garbage idea which slightly reduces the hackiness of the workaround. (I'm afraid recompiling Perl isn't an option.)	[reply]
Re: Bug when undefining a large hash by dHarry (Abbot) on Aug 22, 2008 at 13:26 UTC
Maybe you could try a two-step approach: `%hash = (); undef %hash;` [download] to see if it improves the situation. update May you can also try delete. It is supposed to be slower but it would be interesting to compare. I have tried your example on Windows XP and Redhat (both Perl 5.8) and it runs fine. Strangely enough it takes a bit longer under Redhat?! Unfortunately I don't have FreeBSD available.	[reply] [d/l]
Re^2: Bug when undefining a large hash by oxone (Friar) on Aug 22, 2008 at 13:46 UTC
Thanks for suggestions! Results: "%hash = ();" appears to do much the same as undef(), at least, it creates the same long delay. Iterating all the keys and deleting them is slower than undef, although not by much: around 16 secs as opposed to 14 secs for undef. So, no solutions there, but thanks for the ideas.	[reply]
Re^3: Bug when undefining a large hash by dHarry (Abbot) on Aug 22, 2008 at 13:55 UTC
Frankly speaking that didn't surprise me. I just thought it was worth a try. I tried it myself on Redhat also with increased values up to 30_000_000 same results as you.	[reply]
Re: Bug when undefining a large hash by aufflick (Deacon) on Aug 23, 2008 at 09:59 UTC
FWIW here are my results on MacOSX 10.5 PPC using MacPort's perl 5.8.8: `s/iter With undef Without undef With undef 22.5 -- -9% Without undef 20.6 10% --` [download] And with the system perl 5.8.8: `s/iter With undef Without undef With undef 22.5 -- -8% Without undef 20.7 9% --` [download] So on this platform the undef time is certainly measurable, but exactly an order less than the creation time. Here is the code I used for the benchmark comparisons: `use strict; use warnings; use Benchmark qw(:all); sub without_undef { # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; } sub with_undef { # Create a large hash my %hash; my $count = 3_000_000; $hash{$count} = rand(10) while $count--; # Undefine it undef %hash; } cmpthese(20, { 'Without undef' => \&without_undef, 'With undef' => \&with_undef, });` [download]	[reply] [d/l] [select]
Re^2: Bug when undefining a large hash by ggvaidya (Pilgrim) on Aug 25, 2008 at 04:45 UTC
Thanks for the code, I used it to benchmark Perl 5.10.0 on Linux: `s/iter With undef Without undef With undef 17.2 -- -26% Without undef 12.8 34% --` [download] which, although a bigger difference than yours, isn't as bad as what oxone reported. This was on: `Linux 2.6.25 #6 SMP Tue Aug 5 17:42:15 SGT 2008 i686 GNU/Linux This is perl, v5.10.0 built for i486-linux-gnu-thread-multi` [download]	[reply] [d/l] [select]


We don't bite newbies here... much
	PerlMonks