Re: '%hash = ()' is slower than 'undef %hash'

rsFalse:

For many applications, that's a false economy. If you're going to reuse the hash container, then it seems to me to be cheaper to clear the hash (%h = ()) instead of destroying the hash container and then recreating it (undef %hash):

$ perl pm_hash_clear_vs_undef_1214851.pl
Checking container differences between delete and clear:
Hash size (initial): 0/8
Hash size (after fill): 62/64
Hash size (after clear): 0/64
Hash size (after delete): 0/8

             Rate    delete     reuse overwrite
delete    38049/s        --       -3%      -39%
reuse     39358/s        3%        --      -37%
overwrite 62375/s       64%       58%        --
[download]

Overwriting the hash is obviously the fastest, as you needn't clear or destroy the container. Of course for many applications you'd have the added headache of ensuring that old data and current data don't mix.

Clearing the hash container allows you to reuse the hash without mixing old and current data, but might appear to be slower than simply deleting the hash container.

Deleting the hash container might appear to be faster until you also account for the time it takes to recreate the hash container when you use it. It might matter more than it appears, though: I'd expect that clearing the hash may leave the container at the same size, so re-using the hash in the case there are many keys may be significantly faster than clearing/recreating it because perl could avoid the multiple container resize operations as it adds the keys. I was going to test that, but for some reason, I don't see how to make perl give the "used/total" buckets value for a hash any longer. (Funny, when I was a perl novice, I was getting that frequently, but now that I want it, I can't seem to make it happen. I guess I'll have to hit the documentation and see if I can suss it out. If so, I'll try to remember to update this node.)

Deleting the hash container might appear to be faster until you also account for the time it takes to recreate the hash container when you use it. It matters a little more than it appears, though: clearing the hash leaves the container the same size, so re-using the hash is slightly faster than clearing/recreating it because perl can avoid many of the container resize operations as it adds the keys. On the positive side, though, clearing the container may allow your application to reclaim some memory in the event that some datasets may have significantly more keys than are ordinarily needed. (Although I expect that would be as insignificant as the savings from the resizes just mentioned.)

At least that's how I see it... I'm providing the benchmark so you can point out what I may be missing...

$ cat pm_hash_clear_vs_undef_1214851.pl
use strict;
use warnings;
use Benchmark ':all';
use Hash::Util 'bucket_stats';

my @some_keys = ('A' .. 'Z', 'a' .. 'z', '0' .. '9');

print "Checking container differences between delete and clear:\n";
my %gh;
print "Hash size (initial): ", old_hash_stats(\%gh), "\n";
fill_hash(\%gh);
print "Hash size (after fill): ", old_hash_stats(\%gh), "\n";
%gh=();
print "Hash size (after clear): ", old_hash_stats(\%gh), "\n";
undef %gh;
print "Hash size (after delete): ", old_hash_stats(\%gh), "\n";
print "\n";

sub old_hash_stats {
    my $hr = shift;
    my @hash_stats = bucket_stats($hr);
    return "$hash_stats[0]/$hash_stats[1]";
}

sub fill_hash {
    my $hr = shift;
    @{$hr}{@some_keys} = (0) x @some_keys;
}

cmpthese(500000, {

    'overwrite' => sub {
        my %h;
        fill_hash(\%h);
        fill_hash(\%h);
    },

    'delete' => sub {
        my %h;
        fill_hash(\%h);
        undef %h;
        fill_hash(\%h);
    },

    'reuse' => sub {
        my %h;
        fill_hash(\%h);
        %h = ();
        fill_hash(\%h);
    },
});
[download]

Update: It seems that the old behavior of scalar(%h) changed in version 5.25.3 from displaying "buckets used/bucket count" to simply ~~"buckets used". (A poor idea, in my opinion.)~~ "keys in hash". Anyway, with the Hash::Util function bucket_stats we can still get the information. I've edited the text and benchmark accordingly, and rearranged things a little for readability. (Thanks to choroba for the note.)

Update 2: Added the bit about destroying the container allows you to reclaim memory as a possible benefit.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re: '%hash = ()' is slower than 'undef %hash' Select or Download Code


Perl-Sensitive Sunglasses
	PerlMonks