Re: better array to hash conversion

Replies are listed 'Best First'.

Re^2: better array to hash conversion
by BrowserUk (Patriarch) on Dec 11, 2012 at 14:41 UTC

As the array size grows, it doesn't take long for the OPs original to out pace variation1. It only requires a 200,000 or so for that to happen, and the benefits mount geometrically as the array size grows:

#!/usr/bin/env perl
use strict;
use warnings;
use Benchmark qw(:all);

our @array = 'aaaa' .. 'lzzz';
print "$#array\n";

sub original {
  my %hash;
  for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx;}
}

sub variation1 {
  my %hash;
  @hash{ @array } = 0 .. $#array;
}

sub variation2 {
  my %hash = map { $array[$_] => $_ } 0..$#array;
}

sub variation3 {
  my $idx = 0;
  my %hash = map { $_ => $idx++ } @array;
}

sub variation4 {
  my $idx = 0;
  my %hash; $hash{ $_ } = $idx++ for @array;
}


cmpthese -5, {
   'original'   => \&original,
   'variation1' => \&variation1,
   'variation2' => \&variation2,
   'variation3' => \&variation3,
   'variation4' => \&variation4,
};

__END__
C:\test>junk91
210911
             Rate variation2 variation3 variation1   original variatio
+n4
variation2 2.08/s         --        -2%       -36%       -38%       -4
+2%
variation3 2.12/s         2%         --       -35%       -37%       -4
+1%
variation1 3.26/s        57%        54%         --        -3%        -
+9%
original   3.37/s        62%        59%         3%         --        -
+6%
variation4 3.57/s        72%        68%         9%         6%         
+--
[download]

(I've added another variation that works better for large arrays.)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

[reply]
[d/l]

Re^3: better array to hash conversion

by karlgoethebier (Abbot) on Dec 11, 2012 at 15:30 UTC

This is very interesting. Do you know why - and like to explain it?

Thanks and best regards, Karl

ŤThe Crux of the Biscuit is the Apostropheť

[reply]

Re^4: better array to hash conversion

by BrowserUk (Patriarch) on Dec 11, 2012 at 15:38 UTC

Because all of the map-based methods construct, copy and later discard, multiple, intermediary lists on their way to constructing the hash.

And the costs of doing large numbers of small memory allocations and deallocations add up; especially if they memory manager has to go to the OS a couple of times to expand the process memory pool.

Using for avoids constructing many of the intermediary lists.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

[reply]

Re^5: better array to hash conversion

by tobyink (Canon) on Dec 11, 2012 at 16:24 UTC

Re^6: better array to hash conversion (wrong metric)

by tye (Sage) on Dec 11, 2012 at 16:44 UTC

Some notes below your chosen depth have not been shown here

Re^6: better array to hash conversion

by karlgoethebier (Abbot) on Dec 12, 2012 at 07:31 UTC

Re^5: better array to hash conversion

by karlgoethebier (Abbot) on Dec 12, 2012 at 07:32 UTC

Re^2: better array to hash conversion
by GrandFather (Saint) on Dec 11, 2012 at 20:10 UTC

This is premature optimisation gone mad! For most purposes for speed to be a factor in this decision the array would need to contain of the order of 1 million entries.

Sure, benchmarks are fun to write (although often hard to make meaningful), but the overwhelming criteria in this sort of coding decision is clarity and maintainability of the code. By that metric on both counts 'original' is way down the list. I'd go for variation 1 or a for modifier version of 'original', either of which is clear, succinct and not particularly prone to coding errors.

True laziness is hard work

[reply]

Re^3: better array to hash conversion

by BrowserUk (Patriarch) on Dec 12, 2012 at 08:47 UTC

This is premature optimisation gone mad! For most purposes for speed to be a factor in this decision the array would need to contain of the order of 1 million entries.

Actually, significant differences start at around 200,000; and as someone who regularly does similar processing with 10s and even 100s of millions, knowing what works quickest whilst avoiding unnecessary memory growth is important.

And unless you have some psychic insight to the OPs application, you have nothing on which to base your conclusions, so it is they that are "premature".

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

[reply]

Re^2: better array to hash conversion
by perltux (Monk) on Dec 11, 2012 at 14:27 UTC

Many thanks for that. I find it very interesting that the 'for' loop is actually the second fastest solution, faster than the solutions using 'map'.

[reply]

Re^3: better array to hash conversion

by space_monk (Chaplain) on Dec 11, 2012 at 19:34 UTC

A for solution is generally faster than a map, and it has been discussed in map versus for amongst other places. Sometimes however it is more expressive to use "map".

A Monk aims to give answers to those who have none, and to learn from those who know more.

[reply]

Re^2: better array to hash conversion
by Anonymous Monk on Dec 11, 2012 at 19:09 UTC

presize/preallocate buckets , change the number of elements, change the perl version, and the winner changes :)

funny bench :)

#!/usr/bin/perl --
use strict; use warnings;
use Benchmark qw/ cmpthese /;

#~ my $cmptime = -abs(shift);
#~ my $cmptime = -3;
my $cmptime = -10;
printf "%10s cmptime %s\n", $], $cmptime;
my @array;
for my $power ( 8, 16,18 ) {
#~ for my $power ( 8, 16, 20, 24 ){
    @array = 1 .. ( 2**$power );
    print "2 ** $power == @{[int @array ]}\n";
    cmpthese(
        $cmptime, {
            map { ; $_ => __PACKAGE__->can( $_ ) }
            qw/ ffor fforc
            ifor imap mmap
            sffor sfforc sifor simap
            slic smap simap2
            /
        }
    );
}
exit;

sub mmap {
    my %hash = map { ; $array[$_] => $_ } 0 .. $#array;
    ();
}

sub imap {
    my $ix = 0;
    my %hash = map { ; $_ => $ix++ } @array;
    ();
}
sub slic { my %hash; @hash{@array} = 0 .. $#array; (); }

sub ffor {
    my %hash;
    for my $ix ( 0 .. $#array ) { $hash{ $array[$ix] } = $ix; }
    ();
}

sub fforc {
    my %hash;
    for( my $ix = 0 ; $ix < @array ; $ix++ ) { $hash{ $array[$ix] } = 
+$ix; }
    ();
}

sub ifor {
    my %hash;
    my $ix = 0;
    $hash{$_} = $ix++ for @array;
    ();
}

## preSize
sub sffor {
    my %hash;
    keys( %hash ) = int @array;
    for my $ix ( 0 .. $#array ) { $hash{ $array[$ix] } = $ix; }
    ();
}

sub sfforc {
    my %hash;
    keys( %hash ) = int @array;
    for( my $ix = 0 ; $ix < @array ; $ix++ ) { $hash{ $array[$ix] } = 
+$ix; }
    ();
}

sub sifor {
    my %hash;
    my $ix = 0;
    keys( %hash ) = int @array;
    $hash{$_} = $ix++ for @array;
    ();
}

sub smap {
    my %hash;
    keys( %hash ) = int @array;
    @hash{@array} = 0 .. $#array;
    ();
}

sub simap {
    my $i = 0;
    my %hash;
    keys( %hash ) = int @array;
    %hash = map { $_ => $i++ } @array;
    ();
}


sub simap2 {
    my $i = 0;
    my %hash;
    keys( %hash ) = int @array;
    %hash = map(( $_ => $i++ ), @array);
    ();
}

__END__
[download]

funny results :)

5.008008 c -10
2 ** 8 == 256
Rate simap2
simap2 2721/s
simap 2770/s
mmap 3196/s
imap 3505/s
fforc 3981/s
sffor 4335/s
sfforc 4535/s
ffor 4838/s
ifor 5027/s
sifor 5798/s
smap 8123/s
slic 9384/s


5.008008 c -10
2 ** 16 == 65536
Rate simap2
simap2 3.82/s
simap 3.83/s
mmap 4.53/s
imap 5.26/s
sffor 7.86/s
sfforc 7.89/s
fforc 8.09/s
smap 8.46/s
sifor 8.68/s
ffor 8.83/s
ifor 8.93/s
slic 9.47/s


5.008008 c -10
2 ** 18 == 2621443



Rate simap2
simap2 0.810/s
simap 0.825/s
mmap 0.979/s
imap 1.18/s
smap 1.68/s
sfforc 1.75/s
sffor 1.80/s
sifor 1.90/s
fforc 1.90/s
slic 1.93/s
ffor 2.08/s
ifor 2.10/s
[download]

5.008009 c -10
2 ** 8 == 256
Rate simap
simap 1767/s
simap2 1775/s
mmap 2017/s
imap 2183/s
fforc 2980/s
sffor 3326/s
ffor 3389/s
ifor 3565/s
sfforc 4920/s
sifor 6445/s
smap 8908/s
slic 9698/s


5.008009 c -10
2 ** 16 == 65536
Rate mmap
mmap 2.36/s
simap2 2.37/s
simap 2.37/s
imap 2.51/s
sffor 5.77/s
fforc 6.03/s
ffor 6.42/s
ifor 6.53/s
sfforc 7.95/s
smap 8.75/s
sifor 8.83/s
slic 9.82/s


5.008009 c -10
2 ** 18 == 2621442


**too few iter
Rate imap
imap 0.187/s
mmap 0.259/s
simap2 0.298/s
simap 0.302/s
sffor 1.44/s
fforc 1.50/s
ffor 1.60/s
ifor 1.63/s
smap 1.68/s
sfforc 1.75/s
slic 1.91/s
sifor 1.92/s
[download]

5.012002 c -10
2 ** 8 == 256
Rate mmap
mmap 1364/s
imap 1456/s
simap 1734/s
simap2 1742/s
fforc 3086/s
sffor 3224/s
ffor 3479/s
ifor 3551/s
sfforc 5073/s
sifor 6343/s
smap 8775/s
slic 9969/s


5.012002 c -10
2 ** 16 == 65536
Rate mmap
mmap 1.99/s
imap 2.01/s
simap 3.10/s
simap2 3.11/s
sffor 6.63/s
fforc 6.81/s
ffor 7.27/s
ifor 7.35/s
sfforc 9.58/s
sifor 10.5/s
smap 10.6/s
slic 12.1/s


5.012002 c -10
2 ** 18 == 2621442


**too few iter
Rate imap
imap 0.211/s
mmap 0.242/s
simap2 0.406/s
simap 0.409/s
sffor 1.67/s
fforc 1.75/s
ffor 1.85/s
ifor 1.89/s
sfforc 2.11/s
smap 2.11/s
sifor 2.32/s
slic 2.46/s
[download]

5.014001 c -10
2 ** 8 == 256
Rate mmap
mmap 2056/s
imap 2163/s
simap 2167/s
simap2 2191/s
sfforc 3280/s
fforc 3339/s
sffor 3799/s
ffor 3854/s
sifor 3880/s
ifor 3897/s
smap 4659/s
slic 4748/s


5.014001 c -10
2 ** 16 == 65536
Rate imap
imap 2.07/s
mmap 2.44/s
simap2 2.59/s
simap 2.60/s
sfforc 7.01/s
fforc 7.13/s
sifor 7.58/s
sffor 7.63/s
slic 7.72/s
smap 7.74/s
ffor 7.75/s
ifor 7.81/s


5.014001 c -10
2 ** 18 == 2621440
**too few iter
**too few iter
**too few iter
Rate imap
imap 0.156/s
mmap 0.173/s
simap2 0.177/s
simap 0.177/s
sfforc 1.62/s
fforc 1.63/s
slic 1.67/s
smap 1.67/s
sifor 1.75/s
sffor 1.76/s
ffor 1.77/s
ifor 1.79/s
[download]

5.016001 c -10
2 ** 8 == 256
Rate mmap
mmap 2227/s
simap 2282/s
simap2 2283/s
imap 2321/s
sfforc 3331/s
fforc 3382/s
sffor 3484/s
ffor 3575/s
sifor 3772/s
ifor 3831/s
smap 4733/s
slic 4829/s


5.016001 c -10
2 ** 16 == 65536
Rate imap
imap 2.06/s
mmap 2.48/s
simap 2.59/s
simap2 2.65/s
sfforc 6.90/s
smap 6.92/s
slic 7.01/s
sffor 7.07/s
fforc 7.12/s
ffor 7.33/s
sifor 7.43/s
ifor 7.69/s


5.016001 c -10
2 ** 18 == 2621440
**too few iter
**too few iter
**too few iter
Rate imap
imap 0.156/s
mmap 0.176/s
simap2 0.177/s
simap 0.178/s
smap 1.45/s
slic 1.47/s
sfforc 1.60/s
fforc 1.63/s
sffor 1.65/s
ffor 1.68/s
sifor 1.70/s
ifor 1.75/s
[download]

[reply]
[d/l]
[select]


Do you know where your variables are?
	PerlMonks