http://qs321.pair.com?node_id=1182092

vr has asked for the wisdom of the Perl Monks concerning the following question:

I was reading this article: http://www.sysarch.com/Perl/sort_paper.html (actually, mentioned here: Re: Sorting geometric coordinates based on priority), and found this:

If the approximate size of the data set is known, preallocating the hash improves performance.

keys my %cache = @in; $cache{$_} = KEY($_) for @in;

The following sets up the cache more efficiently, using a hash slice:

keys my %cache = @in; @cache{@in} = map KEY($_) => @in;

I liked the idiom

keys %h = @a; @h{ @a } = ... ; # do something useful

and thought it would be nice to remember and use it sometimes. While, of course, I was using hash slices before, but rather because they look so concise and, somehow because of this, I felt that code, as a result, must, indeed, be more efficient. And now additional optimization through "magical" use of keys as lvalue, forcing scalar context on array. Actually, keys mentions this optimization, but I missed it before:

Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.

Then I thought it strange that assigning to large hash slice still requires this "preallocation". Then I ran this test:

use strict; use warnings; use Benchmark qw/ cmpthese /;; for my $count ( 100, 1_000, 10_000, 100_000 ) { cmpthese( -5, { 1 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; $h{ $_ } = log for @a; return \%h }, 2 => sub { my @a = map { log } 2 .. $count; my %h; $h{ $_ } = log for @a; return \%h }, 3 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; @h{ @a } = map { log } @a; return \%h }, 4 => sub { my @a = map { log } 2 .. $count; my %h; @h{ @a } = map { log } @a; return \%h }, }) }

log is here to imitate at least some payload (useful work), and to create longer hash keys (if it matters). Returning a reference so that Perl doesn't sniff we don't need this hash and won't skip any work. And that's because of results:

Rate 4 3 2 1 4 1507/s -- -3% -4% -5% 3 1549/s 3% -- -2% -3% 2 1576/s 5% 2% -- -1% 1 1593/s 6% 3% 1% -- Rate 1 2 4 3 1 140/s -- -3% -4% -4% 2 145/s 3% -- -0% -1% 4 145/s 4% 0% -- -0% 3 146/s 4% 1% 0% -- Rate 4 2 3 1 4 12.1/s -- -7% -8% -9% 2 12.9/s 7% -- -2% -3% 3 13.1/s 9% 2% -- -1% 1 13.3/s 10% 3% 1% -- s/iter 4 1 2 3 4 1.39 -- -3% -3% -3% 1 1.35 3% -- -0% -0% 2 1.35 3% 0% -- -0% 3 1.35 3% 0% 0% --

No meaningful difference at all. So, are my tests flawed, or claims about efficiency of slices and preallocation don't hold any water?