http://qs321.pair.com?node_id=706501

brycen has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

What's the most efficient way to create and reference a key only hash (In general, but specifically in a mod-perl environment)? Is it:

#!/usr/bin/perl -w use strict; my %hash1 = ( shave => '', the => '', modern => '', way => '', ); my %hash2 = ( shave => 1, the => 1, modern => 1, way => 1, ); my $word = 'modern'; if( exists($hash1{$word}) ) { print "Word $word is in hash1\n"; } if( $hash2{$word} ) { print "Word $word is in hash2\n"; }
I am aware of  perl is a profligate wastrel when it comes to memory use. But the -DL and warn(!!!) hacks don't work on my Perl v5.8.8. I don't get any memory debugging with:
export PERL_DEBUG_MSTATS=1 perl test_hasheffeciency.pl

Replies are listed 'Best First'.
Re: Effecicncy of key-only hash
by tilly (Archbishop) on Aug 24, 2008 at 07:13 UTC
    A string is larger than an integer is larger than undef. Proof:
    #! /usr/bin/perl -w use strict; use Devel::Size qw(total_size); my %hash1 = ( shave => '', the => '', modern => '', way => '', ); my %hash2 = ( shave => 1, the => 1, modern => 1, way => 1, ); my %hash3 = ( shave => undef, the => undef, modern => undef, way => undef, ); print "1: " . total_size(\%hash1) . "\n"; print "2: " . total_size(\%hash2) . "\n"; print "3: " . total_size(\%hash3) . "\n"; __END__ 1: 309 2: 261 3: 245
    If you want the fastest way to initialize that hash, I believe it is
    my %hash; undef(@hash{qw(shave the modern way)});
    but your maintenance programmer may have something nasty to say about that.
      undef(@hash{qw(shave the modern way)});
      Do @hash{qw(shave the modern way)}=(); instead. It's arguably a bug that the former creates the shave, the, and modern keys if they don't exist. Note that it does not set their values to undef if they do already exist.
        Many years ago there was a discussion on p5p about the fastest way to initialize an empty hash. I forget who it was who brought up that construct as the fastest possible way to do it, but I do remember it was someone who should know. Maybe Nick Ing-Simmons, but I won't swear to it. However at the time it was certainly faster than assigning an empty list because all other versions created temporary intermediate scalars and that one does not.

        Of course now, many versions later, it might not be still true. But that fragment has stuck in my head.

        Please note that I included that version for amusement, and not for serious use. Which I indicated with my comment about the maintenance programmer's response. Which comment has been confirmed by the questions and complaints we've had. :-)

      I know it isn't my thread but I had to ask: Can you please explain what you are doing here:
      undef(@hash{qw(shave the modern way)});
      Why you accessing a hash as an array ? ('@' char)? I am missing something I think ...
        The @ tells you that what you expect to return is an array, and the curly braces {...} tell you that what you access is a hash.

        qw(shave the modern way) is a list, so @hash{qw(shave the modern way)} is a list ("slice") of items in %hash with the listed keys.

        It's a hash slice. See perldata for more information on hash and array slices
Re: Effeciency of key-only hash
by moritz (Cardinal) on Aug 24, 2008 at 09:03 UTC
    I'd do it with 1 as the hash values, because then you can simply write if ($hash{$key}), and hell doesn't break loose if you happen to forget the exists in front of it. And speaking from my own experience, at one point you will forget it.

      I agree that it's more comfortable to omit the "exists". However if you want to work with hash slices, it's not that easy to use "1"s.

      tilly's code:

      undef(@hash{qw(shave the modern way)});

      would have to be written like this:

      @hash{qw(shave the modern way)} = (1) x 4; # cumbersome
        I always do those inits as:
        my %hash = map {($_=>1)} qw(shave the modern way);
        I always found it to be a little more maintainable (i.e. readable) than the x operator trick.


        -pete
        "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
        The fact is, the exists seems to be really faster:
        use Benchmark qw(cmpthese); use strict; use warnings; $\="\n"; cmpthese 1000000, { empty_strings => sub { my %h = ( shave => '', the => '', modern => '', way => '', ); my $mod = defined $h{modern}; my $ant = not defined $h{antique}; our $went_empty; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_empty++ +; }, ones => sub { my %h = ( shave => 1, the => 1, modern => 1, way => 1, ); my $mod = $h{modern}; my $ant = not $h{antique}; our $went_one; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_one++; }, undefs => sub { my %h = ( shave => undef, the => undef, modern => undef, way => undef, ); my $mod = exists $h{modern}; my $ant = not exists $h{antique}; our $went_undef; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_undef++ +; }, one_big_undef => sub { my %h; undef @h{qw{shave the modern way}}; my $mod = exists $h{modern}; my $ant = not exists $h{antique}; our $went_big; print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_big++; } }
        gives:
        y, y
        y, y
        y, y
        y, y
                          Rate empty_strings          ones        undefs one_big_undef
        empty_strings 319489/s            --          -18%          -20%          -29%
        ones          390625/s           22%            --           -2%          -14%
        undefs        398406/s           25%            2%            --          -12%
        one_big_undef 452489/s           42%           16%           14%            --
        
        
        []s, HTH, Massa (κς,πμ,πλ)
Re: Effecicncy of key-only hash
by kyle (Abbot) on Aug 24, 2008 at 12:45 UTC
Re: Effecicncy of key-only hash
by Skeeve (Parson) on Aug 24, 2008 at 13:09 UTC

    I usually do it this way:

    • If I want to initialize a hash just to know whether or not a word has been seen:
      my %hash1; @hash1{qw/shave the modern way/}= ();
    • If it's done in a loop, when I read word one after the other, maybe from a database query or some other input
      ++$hash{$the_word_just_read};
      Even if I don't need the actual counter, I prefer this over
      $hash{$the_word_just_read}= undef; # or 1 or ''

    To be honest: I don't really care for the efficiency of that.


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: Effecicncy of key-only hash
by FunkyMonk (Chancellor) on Aug 24, 2008 at 22:30 UTC
    As others have suggested, I'd do it like this
    my %hash = map { $_ => 1 } qw{ shave the modern way };

    because, wherever possible, I want to keep declaration with initialisation.

    It's probably a personal thing, but it really bugs me when I've got to separate them, as in

    my $str; $str = $_ x $freq{$_} for sort keys %freq;

    which I had to write earlier today.

    If you're really that concerned about speed, do it in C instead.
    If you're really that concerned about memory, get more. It's really cheap nowadays.


    Unless I state otherwise, all my code runs with strict and warnings
Re: Effecicncy of key-only hash
by aufflick (Deacon) on Aug 25, 2008 at 04:15 UTC
    tilly is right, and if you want to know why you might like to read Perl Guts Illustrated which is really quite interesting.

    It also makes sense that exists is faster than checking the truth of the value, but you will then need to be careful about auto-vivification. Also if you plan on doing a lot of removing from the set you may want to compare the performance of undef-ing a hash entry versus delete-ing it.

Re: Effecicncy of key-only hash
by brycen (Monk) on Aug 25, 2008 at 16:53 UTC
    Thanks for all the great answers, especially those with code samples and benchmarks! It looks like the syntax and space efficiency winner is Set::Light, though not by much (it uses a C module to point all the set members to a single object).

    Remember I'm in mod_perl, so construction/deconstruction time is less important than memory footprint and lookup time.

    #! /usr/bin/perl -w use strict; use Devel::Size qw(total_size); use Set::Light; my $lookup = "shave"; my %hash1 = ( shave => '', the => '', very => '', modern => '', way => '', ); my %hash2 = ( shave => 1, the => 1, very => 1, modern => 1, way => 1, ); my %hash3 = ( shave => undef, the => undef, very => undef, modern => undef, way => undef, ); print "Hash3 " . ((exists $hash3{$lookup})?"contains":"does not contai +n") . " $lookup\n"; my $foo = undef; my %hash4 = ( shave => \$foo, the => \$foo, very => \$foo, modern => \$foo, way => \$foo, ); my $set = Set::Light->new( qw/shave the very modern way/ ); print "Set " . (($set->has($lookup))?"contains":"does not contain") . +" $lookup\n"; print "Size 1: " . total_size(\%hash1) . "\n"; print "Size 2: " . total_size(\%hash2) . "\n"; print "Size 3: " . total_size(\%hash3) . "\n"; print "Size 4: " . total_size(\%hash4) . "\n"; print "Size S: " . total_size(\$set) . "\n"; __END__
    Hash3 contains shave
    Set contains shave
    Size 1: 363
    Size 2: 303
    Size 3: 283
    Size 4: 315
    Size S: 251