Effecicncy of key-only hash

brycen has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

What's the most efficient way to create and reference a key only hash (In general, but specifically in a mod-perl environment)? Is it:

#!/usr/bin/perl -w
use strict;

my %hash1 = (
    shave  => '',
    the    => '',
    modern => '',
    way    => '',
);

my %hash2 = (
    shave  => 1,
    the    => 1,
    modern => 1,
    way    => 1,
);

my $word   = 'modern';

if( exists($hash1{$word}) ) {
    print "Word $word is in hash1\n";
    }
if( $hash2{$word} ) {
    print "Word $word is in hash2\n";
    }
[download]

I am aware of perl is a profligate wastrel when it comes to memory use. But the -DL and warn(!!!) hacks don't work on my Perl v5.8.8. I don't get any memory debugging with:

export PERL_DEBUG_MSTATS=1
perl test_hasheffeciency.pl
[download]

Comment on Effecicncy of key-only hash Select or Download Code

Replies are listed 'Best First'.
Re: Effecicncy of key-only hash by tilly (Archbishop) on Aug 24, 2008 at 07:13 UTC
A string is larger than an integer is larger than undef. Proof: `#! /usr/bin/perl -w use strict; use Devel::Size qw(total_size); my %hash1 = ( shave => '', the => '', modern => '', way => '', ); my %hash2 = ( shave => 1, the => 1, modern => 1, way => 1, ); my %hash3 = ( shave => undef, the => undef, modern => undef, way => undef, ); print "1: " . total_size(\%hash1) . "\n"; print "2: " . total_size(\%hash2) . "\n"; print "3: " . total_size(\%hash3) . "\n"; __END__ 1: 309 2: 261 3: 245` [download] If you want the fastest way to initialize that hash, I believe it is `my %hash; undef(@hash{qw(shave the modern way)});` [download] but your maintenance programmer may have something nasty to say about that.	[reply] [d/l] [select]
Re^2: Effecicncy of key-only hash by ysth (Canon) on Aug 24, 2008 at 18:01 UTC
undef(@hash{qw(shave the modern way)}); Do `@hash{qw(shave the modern way)}=();` instead. It's arguably a bug that the former creates the shave, the, and modern keys if they don't exist. Note that it does not set their values to undef if they do already exist. -- Online Fortune Cookie Search Office Space merchandise	[reply] [d/l]
Re^3: Effecicncy of key-only hash by tilly (Archbishop) on Aug 24, 2008 at 22:35 UTC
Many years ago there was a discussion on p5p about the fastest way to initialize an empty hash. I forget who it was who brought up that construct as the fastest possible way to do it, but I do remember it was someone who should know. Maybe Nick Ing-Simmons, but I won't swear to it. However at the time it was certainly faster than assigning an empty list because all other versions created temporary intermediate scalars and that one does not. Of course now, many versions later, it might not be still true. But that fragment has stuck in my head. Please note that I included that version for amusement, and not for serious use. Which I indicated with my comment about the maintenance programmer's response. Which comment has been confirmed by the questions and complaints we've had. :-)	[reply]
Re^4: Effecicncy of key-only hash by ysth (Canon) on Aug 24, 2008 at 23:28 UTC
Re^5: Effecicncy of key-only hash by massa (Hermit) on Aug 25, 2008 at 11:39 UTC
Some notes below your chosen depth have not been shown here
Re^4: Effecicncy of key-only hash by brycen (Monk) on Aug 25, 2008 at 06:26 UTC
Re^5: Effecicncy of key-only hash by massa (Hermit) on Aug 25, 2008 at 11:38 UTC
Re^2: Effecicncy of key-only hash by leonidlm (Pilgrim) on Aug 24, 2008 at 08:06 UTC
I know it isn't my thread but I had to ask: Can you please explain what you are doing here: `undef(@hash{qw(shave the modern way)});` [download] Why you accessing a hash as an array ? ('@' char)? I am missing something I think ...	[reply] [d/l]
Re^3: Effecicncy of key-only hash by moritz (Cardinal) on Aug 24, 2008 at 08:48 UTC
The `@` tells you that what you expect to return is an array, and the curly braces `{...}` tell you that what you access is a hash. `qw(shave the modern way)` is a list, so `@hash{qw(shave the modern way)}` is a list ("slice") of items in `%hash` with the listed keys.	[reply] [d/l] [select]
Re^3: Effecicncy of key-only hash by FunkyMonk (Chancellor) on Aug 24, 2008 at 08:55 UTC
It's a hash slice. See perldata for more information on hash and array slices	[reply]
Re: Effeciency of key-only hash by moritz (Cardinal) on Aug 24, 2008 at 09:03 UTC
I'd do it with `1` as the hash values, because then you can simply write `if ($hash{$key})`, and hell doesn't break loose if you happen to forget the exists in front of it. And speaking from my own experience, at one point you will forget it.	[reply] [d/l] [select]
Re^2: Effeciency of key-only hash by betterworld (Curate) on Aug 24, 2008 at 10:35 UTC
I agree that it's more comfortable to omit the "exists". However if you want to work with hash slices, it's not that easy to use "1"s. tilly's code: `undef(@hash{qw(shave the modern way)});` [download] would have to be written like this: `@hash{qw(shave the modern way)} = (1) x 4; # cumbersome` [download]	[reply] [d/l] [select]
Re^3: Effeciency of key-only hash by dreadpiratepeter (Priest) on Aug 24, 2008 at 11:39 UTC
I always do those inits as: `my %hash = map {($_=>1)} qw(shave the modern way);` [download] I always found it to be a little more maintainable (i.e. readable) than the x operator trick. -pete "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."	[reply] [d/l]
Re^4: Effeciency of key-only hash by JadeNB (Chaplain) on Aug 24, 2008 at 20:42 UTC
Re^5: Effeciency of key-only hash by betterworld (Curate) on Aug 24, 2008 at 22:34 UTC
Some notes below your chosen depth have not been shown here
Re^4: Effeciency of key-only hash by Anonymous Monk on Aug 24, 2008 at 11:41 UTC
Re^3: Effeciency of key-only hash by massa (Hermit) on Aug 24, 2008 at 12:57 UTC
The fact is, the `exists` seems to be really faster: use Benchmark qw(cmpthese); use strict; use warnings; $\="\n"; cmpthese 1000000, { empty_strings => sub { my %h = ( shave => '', the => '', modern => '', way => '', ); my $mod = defined $h{modern}; my $ant = not defined $h{antique}; our $went_empty; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_empty++ +; }, ones => sub { my %h = ( shave => 1, the => 1, modern => 1, way => 1, ); my $mod = $h{modern}; my $ant = not $h{antique}; our $went_one; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_one++; }, undefs => sub { my %h = ( shave => undef, the => undef, modern => undef, way => undef, ); my $mod = exists $h{modern}; my $ant = not exists $h{antique}; our $went_undef; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_undef++ +; }, one_big_undef => sub { my %h; undef @h{qw{shave the modern way}}; my $mod = exists $h{modern}; my $ant = not exists $h{antique}; our $went_big; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_big++; } } [download] gives: y, y y, y y, y y, y Rate empty_strings ones undefs one_big_undef empty_strings 319489/s -- -18% -20% -29% ones 390625/s 22% -- -2% -14% undefs 398406/s 25% 2% -- -12% one_big_undef 452489/s 42% 16% 14% -- []s, HTH, Massa (κς,πμ,πλ)	[reply] [d/l] [select]
Re^4: Effeciency of key-only hash by kyle (Abbot) on Aug 24, 2008 at 13:05 UTC
Re^5: Effeciency of key-only hash by massa (Hermit) on Aug 24, 2008 at 19:34 UTC
Re: Effecicncy of key-only hash by kyle (Abbot) on Aug 24, 2008 at 12:45 UTC
You might be interested in the discussion attached to Fast Sets of Scalars in Perl. (Set::Light does the same job in less memory than a hash with undef values, but it's slow to destroy.)	[reply]
Re: Effecicncy of key-only hash by Skeeve (Parson) on Aug 24, 2008 at 13:09 UTC
I usually do it this way: If I want to initialize a hash just to know whether or not a word has been seen: `my %hash1; @hash1{qw/shave the modern way/}= ();` [download] If it's done in a loop, when I read word one after the other, maybe from a database query or some other input `++$hash{$the_word_just_read};` [download] Even if I don't need the actual counter, I prefer this over `$hash{$the_word_just_read}= undef; # or 1 or ''` [download] To be honest: I don't really care for the efficiency of that. `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re: Effecicncy of key-only hash by FunkyMonk (Chancellor) on Aug 24, 2008 at 22:30 UTC
As others have suggested, I'd do it like this `my %hash = map { $_ => 1 } qw{ shave the modern way };` [download] because, wherever possible, I want to keep declaration with initialisation. It's probably a personal thing, but it really bugs me when I've got to separate them, as in `my $str; $str = $_ x $freq{$_} for sort keys %freq;` [download] which I had to write earlier today. If you're really that concerned about speed, do it in C instead. If you're really that concerned about memory, get more. It's really cheap nowadays. Unless I state otherwise, all my code runs with strict and warnings	[reply] [d/l] [select]
Re: Effecicncy of key-only hash by aufflick (Deacon) on Aug 25, 2008 at 04:15 UTC
tilly is right, and if you want to know why you might like to read Perl Guts Illustrated which is really quite interesting. It also makes sense that exists is faster than checking the truth of the value, but you will then need to be careful about auto-vivification. Also if you plan on doing a lot of removing from the set you may want to compare the performance of undef-ing a hash entry versus delete-ing it.	[reply]
Re: Effecicncy of key-only hash by brycen (Monk) on Aug 25, 2008 at 16:53 UTC
Thanks for all the great answers, especially those with code samples and benchmarks! It looks like the syntax and space efficiency winner is Set::Light, though not by much (it uses a C module to point all the set members to a single object). Remember I'm in mod_perl, so construction/deconstruction time is less important than memory footprint and lookup time. #! /usr/bin/perl -w use strict; use Devel::Size qw(total_size); use Set::Light; my $lookup = "shave"; my %hash1 = ( shave => '', the => '', very => '', modern => '', way => '', ); my %hash2 = ( shave => 1, the => 1, very => 1, modern => 1, way => 1, ); my %hash3 = ( shave => undef, the => undef, very => undef, modern => undef, way => undef, ); print "Hash3 " . ((exists $hash3{$lookup})?"contains":"does not contai +n") . " $lookup\n"; my $foo = undef; my %hash4 = ( shave => \$foo, the => \$foo, very => \$foo, modern => \$foo, way => \$foo, ); my $set = Set::Light->new( qw/shave the very modern way/ ); print "Set " . (($set->has($lookup))?"contains":"does not contain") . +" $lookup\n"; print "Size 1: " . total_size(\%hash1) . "\n"; print "Size 2: " . total_size(\%hash2) . "\n"; print "Size 3: " . total_size(\%hash3) . "\n"; print "Size 4: " . total_size(\%hash4) . "\n"; print "Size S: " . total_size(\$set) . "\n"; __END__ [download] Hash3 contains shave Set contains shave Size 1: 363 Size 2: 303 Size 3: 283 Size 4: 315 Size S: 251	[reply] [d/l]

Back to Seekers of Perl Wisdom