Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Space Efficiency of Hashes

by Stevie-O (Friar)
on Mar 16, 2005 at 00:52 UTC ( [id://439830] : note . print w/replies, xml ) Need Help??

in reply to Space Efficiency of Hashes

Well, nothing beats a good TIAS with Devel::Size in the mix:
#!/usr/bin/perl use strict; use warnings; my @printables = map chr, 33 .. 126; # random string 30 characters long sub randstr { join '', map { $printables[rand(@printables)] } 1..30 } use Devel::Size qw(size total_size); sub commas { my $str = ''.reverse shift; return scalar reverse join ',', grep length, split /(.{3})/, $str } my %hash = ( randstr => randstr ); for (0..5) { my $count = scalar keys %hash; print commas($count), " elements: ", commas(total_size\%hash), " b +ytes\n"; for (1..9 * $count) { my $s = randstr; $hash{$s} = randstr; } } my $count = scalar keys %hash; print commas($count), " elements: ", commas(total_size\%hash), " bytes +\n";
These are the results it printed out on my machine, for v5.8.4:
1 elements: 176 bytes 10 elements: 1,171 bytes 100 elements: 11,249 bytes 1,000 elements: 111,133 bytes 10,000 elements: 1,135,573 bytes 100,000 elements: 11,224,325 bytes 1,000,000 elements: 111,194,341 bytes
(That last one took several minutes to complete on my system...)

Taking my last result and multiplying by five, it seems that 5 million {acc, build} pairs will take only slightly more than half a gig of RAM to store all those hash entries.

So, if your machine has 2GB of RAM -- you should be good to go.

$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

Replies are listed 'Best First'.
Re^2: Space Efficiency of Hashes
by Anonymous Monk on Mar 16, 2005 at 03:45 UTC

    Thanks (and to Joost as well): this indicates that I'll max out under 2 GB of RAM, leaving plenty free for other stuff running. I didn't think of just going ahead and testing overhead like that, but it seems a really obvious way to approach it in retrospect! Thanks for the enlightenment.

      You can also use Tie::SubsrHash and max memory required for you program will be around 300 MB.
        Nice approach. Fixed-sized records would never come to my mind. Correct name is Tie::SubstrHash.