Re: Space Efficiency of Hashes

Well, nothing beats a good TIAS with Devel::Size in the mix:

#!/usr/bin/perl

use strict;
use warnings;

my @printables = map chr, 33 .. 126;

# random string 30 characters long
sub randstr { join '', map { $printables[rand(@printables)] } 1..30 }

use Devel::Size qw(size total_size);

sub commas {
    my $str = ''.reverse shift;
    return scalar reverse join ',', grep length, split /(.{3})/, $str
}

my %hash = ( randstr => randstr );
for (0..5) {
    my $count = scalar keys %hash;

    print commas($count), " elements: ", commas(total_size\%hash), " b
+ytes\n";

    for (1..9 * $count) {
        my $s = randstr;
        $hash{$s} = randstr;
    }
    
}

my $count = scalar keys %hash;
print commas($count), " elements: ", commas(total_size\%hash), " bytes
+\n";
[download]

These are the results it printed out on my machine, for v5.8.4:

1 elements: 176 bytes
10 elements: 1,171 bytes
100 elements: 11,249 bytes
1,000 elements: 111,133 bytes
10,000 elements: 1,135,573 bytes
100,000 elements: 11,224,325 bytes
1,000,000 elements: 111,194,341 bytes
[download]

(That last one took several minutes to complete on my system...)

Taking my last result and multiplying by five, it seems that 5 million {acc, build} pairs will take only slightly more than half a gig of RAM to store all those hash entries.

So, if your machine has 2GB of RAM -- you should be good to go.

--Stevie-O

$"=$,,$_=q>|\p4<6 8p<M/_|<('=>
.q>.<4-KI<l|2$<6%s!<qn#F<>;$,
.=pack'N*',"@{[unpack'C*',$_]
}"for split/</;$_=$,,y[A-Z a-z]
         {}cd;print lc
[download]

In Section Seekers of Perl Wisdom