comment on

I was reading this article: http://www.sysarch.com/Perl/sort_paper.html (actually, mentioned here: Re: Sorting geometric coordinates based on priority), and found this:

If the approximate size of the data set is known, preallocating the hash improves performance.
keys my %cache = @in;
$cache{$_} = KEY($_) for @in;
[download]
The following sets up the cache more efficiently, using a hash slice:
keys my %cache = @in;
@cache{@in} = map KEY($_) => @in;
[download]

I liked the idiom

keys %h = @a;
@h{ @a } = ... ; # do something useful
[download]

and thought it would be nice to remember and use it sometimes. While, of course, I was using hash slices before, but rather because they look so concise and, somehow because of this, I felt that code, as a result, must, indeed, be more efficient. And now additional optimization through "magical" use of keys as lvalue, forcing scalar context on array. Actually, keys mentions this optimization, but I missed it before:

Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.

Then I thought it strange that assigning to large hash slice still requires this "preallocation". Then I ran this test:

use strict;
use warnings;
use Benchmark qw/ cmpthese /;;

for my $count ( 100, 1_000, 10_000, 100_000 ) {
    cmpthese( -5, {
        1 => sub {
            my @a = map { log } 2 .. $count;
            my %h;
            keys %h = @a;
            $h{ $_ } = log for @a;
            return \%h
        },
        2 => sub {
            my @a = map { log } 2 .. $count;
            my %h;
            $h{ $_ } = log for @a;
            return \%h
        },
        3 => sub {
            my @a = map { log } 2 .. $count;
            my %h;
            keys %h = @a;
            @h{ @a } = map { log } @a;
            return \%h
        },
        4 => sub {
            my @a = map { log } 2 .. $count;
            my %h;
            @h{ @a } = map { log } @a;
            return \%h
        },
    })
}
[download]

log is here to imitate at least some payload (useful work), and to create longer hash keys (if it matters). Returning a reference so that Perl doesn't sniff we don't need this hash and won't skip any work. And that's because of results:

    Rate   4   3   2   1
4 1507/s  -- -3% -4% -5%
3 1549/s  3%  -- -2% -3%
2 1576/s  5%  2%  -- -1%
1 1593/s  6%  3%  1%  --
   Rate   1   2   4   3
1 140/s  -- -3% -4% -4%
2 145/s  3%  -- -0% -1%
4 145/s  4%  0%  -- -0%
3 146/s  4%  1%  0%  --
    Rate   4   2   3   1
4 12.1/s  -- -7% -8% -9%
2 12.9/s  7%  -- -2% -3%
3 13.1/s  9%  2%  -- -1%
1 13.3/s 10%  3%  1%  --
  s/iter   4   1   2   3
4   1.39  -- -3% -3% -3%
1   1.35  3%  -- -0% -0%
2   1.35  3%  0%  -- -0%
3   1.35  3%  0%  0%  --
[download]

No meaningful difference at all. So, are my tests flawed, or claims about efficiency of slices and preallocation don't hold any water?

In reply to Does "preallocating hash improve performance"? Or "using a hash slice"? by vr

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


go ahead... be a heretic
	PerlMonks