I was reading this article: http://www.sysarch.com/Perl/sort_paper.html (actually, mentioned here: Re: Sorting geometric coordinates based on priority), and found this:
If the approximate size of the data set is known, preallocating the hash improves performance.
keys my %cache = @in;
$cache{$_} = KEY($_) for @in;
The following sets up the cache more efficiently, using a hash slice:
keys my %cache = @in;
@cache{@in} = map KEY($_) => @in;
I liked the idiom
keys %h = @a;
@h{ @a } = ... ; # do something useful
and thought it would be nice to remember and use it sometimes. While, of course, I was using hash slices before, but rather because they look so concise and, somehow because of this, I felt that code, as a result, must, indeed, be more efficient. And now additional optimization through "magical" use of keys as lvalue, forcing scalar context on array. Actually, keys mentions this optimization, but I missed it before:
Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.
Then I thought it strange that assigning to large hash slice still requires this "preallocation". Then I ran this test:
use strict;
use warnings;
use Benchmark qw/ cmpthese /;;
for my $count ( 100, 1_000, 10_000, 100_000 ) {
cmpthese( -5, {
1 => sub {
my @a = map { log } 2 .. $count;
my %h;
keys %h = @a;
$h{ $_ } = log for @a;
return \%h
},
2 => sub {
my @a = map { log } 2 .. $count;
my %h;
$h{ $_ } = log for @a;
return \%h
},
3 => sub {
my @a = map { log } 2 .. $count;
my %h;
keys %h = @a;
@h{ @a } = map { log } @a;
return \%h
},
4 => sub {
my @a = map { log } 2 .. $count;
my %h;
@h{ @a } = map { log } @a;
return \%h
},
})
}
log is here to imitate at least some payload (useful work), and to create longer hash keys (if it matters). Returning a reference so that Perl doesn't sniff we don't need this hash and won't skip any work. And that's because of results:
Rate 4 3 2 1
4 1507/s -- -3% -4% -5%
3 1549/s 3% -- -2% -3%
2 1576/s 5% 2% -- -1%
1 1593/s 6% 3% 1% --
Rate 1 2 4 3
1 140/s -- -3% -4% -4%
2 145/s 3% -- -0% -1%
4 145/s 4% 0% -- -0%
3 146/s 4% 1% 0% --
Rate 4 2 3 1
4 12.1/s -- -7% -8% -9%
2 12.9/s 7% -- -2% -3%
3 13.1/s 9% 2% -- -1%
1 13.3/s 10% 3% 1% --
s/iter 4 1 2 3
4 1.39 -- -3% -3% -3%
1 1.35 3% -- -0% -0%
2 1.35 3% 0% -- -0%
3 1.35 3% 0% 0% --
No meaningful difference at all. So, are my tests flawed, or claims about efficiency of slices and preallocation don't hold any water?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.