Re: Effecicncy of key-only hash
by tilly (Archbishop) on Aug 24, 2008 at 07:13 UTC
|
A string is larger than an integer is larger than undef. Proof:
#! /usr/bin/perl -w
use strict;
use Devel::Size qw(total_size);
my %hash1 = (
shave => '',
the => '',
modern => '',
way => '',
);
my %hash2 = (
shave => 1,
the => 1,
modern => 1,
way => 1,
);
my %hash3 = (
shave => undef,
the => undef,
modern => undef,
way => undef,
);
print "1: " . total_size(\%hash1) . "\n";
print "2: " . total_size(\%hash2) . "\n";
print "3: " . total_size(\%hash3) . "\n";
__END__
1: 309
2: 261
3: 245
If you want the fastest way to initialize that hash, I believe it is
my %hash;
undef(@hash{qw(shave the modern way)});
but your maintenance programmer may have something nasty to say about that. | [reply] [d/l] [select] |
|
undef(@hash{qw(shave the modern way)});
Do @hash{qw(shave the modern way)}=(); instead. It's arguably a bug that the former creates the shave, the, and modern keys if they don't exist. Note that it does not set their values to undef if they do already exist.
| [reply] [d/l] |
|
Many years ago there was a discussion on p5p about the fastest way to initialize an empty hash. I forget who it was who brought up that construct as the fastest possible way to do it, but I do remember it was someone who should know. Maybe Nick Ing-Simmons, but I won't swear to it. However at the time it was certainly faster than assigning an empty list because all other versions created temporary intermediate scalars and that one does not.
Of course now, many versions later, it might not be still true. But that fragment has stuck in my head.
Please note that I included that version for amusement, and not for serious use. Which I indicated with my comment about the maintenance programmer's response. Which comment has been confirmed by the questions and complaints we've had. :-)
| [reply] |
|
|
|
|
|
|
I know it isn't my thread but I had to ask: Can you please explain what you are doing here:
undef(@hash{qw(shave the modern way)});
Why you accessing a hash as an array ? ('@' char)?
I am missing something I think ... | [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
It's a hash slice. See perldata for more information on hash and array slices
| [reply] |
Re: Effeciency of key-only hash
by moritz (Cardinal) on Aug 24, 2008 at 09:03 UTC
|
I'd do it with 1 as the hash values, because then you can simply write if ($hash{$key}), and hell doesn't break loose if you happen to forget the exists in front of it. And speaking from my own experience, at one point you will forget it. | [reply] [d/l] [select] |
|
undef(@hash{qw(shave the modern way)});
would have to be written like this:
@hash{qw(shave the modern way)} = (1) x 4; # cumbersome
| [reply] [d/l] [select] |
|
I always do those inits as:
my %hash = map {($_=>1)} qw(shave the modern way);
I always found it to be a little more maintainable (i.e. readable) than the x operator trick.
-pete
"Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
| [reply] [d/l] |
|
|
|
|
|
The fact is, the exists seems to be really faster:
use Benchmark qw(cmpthese);
use strict;
use warnings;
$\="\n";
cmpthese 1000000, {
empty_strings => sub {
my %h = (
shave => '',
the => '',
modern => '',
way => '',
);
my $mod = defined $h{modern};
my $ant = not defined $h{antique};
our $went_empty;
print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_empty++
+;
},
ones => sub {
my %h = (
shave => 1,
the => 1,
modern => 1,
way => 1,
);
my $mod = $h{modern};
my $ant = not $h{antique};
our $went_one;
print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_one++;
},
undefs => sub {
my %h = (
shave => undef,
the => undef,
modern => undef,
way => undef,
);
my $mod = exists $h{modern};
my $ant = not exists $h{antique};
our $went_undef;
print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_undef++
+;
},
one_big_undef => sub {
my %h;
undef @h{qw{shave the modern way}};
my $mod = exists $h{modern};
my $ant = not exists $h{antique};
our $went_big;
print( ($mod?'y':'n'), ', ', ($ant?'y':'n') ) unless $went_big++;
}
}
gives:
y, y
y, y
y, y
y, y
Rate empty_strings ones undefs one_big_undef
empty_strings 319489/s -- -18% -20% -29%
ones 390625/s 22% -- -2% -14%
undefs 398406/s 25% 2% -- -12%
one_big_undef 452489/s 42% 16% 14% --
[]s, HTH, Massa (κς,πμ,πλ)
| [reply] [d/l] [select] |
|
|
Re: Effecicncy of key-only hash
by kyle (Abbot) on Aug 24, 2008 at 12:45 UTC
|
| [reply] |
Re: Effecicncy of key-only hash
by Skeeve (Parson) on Aug 24, 2008 at 13:09 UTC
|
I usually do it this way:
- If I want to initialize a hash just to know whether or not a word has been seen:
my %hash1;
@hash1{qw/shave the modern way/}= ();
- If it's done in a loop, when I read word one after the other, maybe from a database query or some other input
++$hash{$the_word_just_read};
Even if I don't need the actual counter, I prefer this over
$hash{$the_word_just_read}= undef; # or 1 or ''
To be honest: I don't really care for the efficiency of that.
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |
Re: Effecicncy of key-only hash
by FunkyMonk (Chancellor) on Aug 24, 2008 at 22:30 UTC
|
As others have suggested, I'd do it like this
my %hash = map { $_ => 1 } qw{ shave the modern way };
because, wherever possible, I want to keep declaration with initialisation.
It's probably a personal thing, but it really bugs me when I've got to separate them, as in
my $str;
$str = $_ x $freq{$_} for sort keys %freq;
which I had to write earlier today.
If you're really that concerned about speed, do it in C instead.
If you're really that concerned about memory, get more. It's really cheap nowadays.
| [reply] [d/l] [select] |
Re: Effecicncy of key-only hash
by aufflick (Deacon) on Aug 25, 2008 at 04:15 UTC
|
tilly is right, and if you want to know why you might like to read Perl Guts Illustrated which is really quite interesting.
It also makes sense that exists is faster than checking the truth of the value, but you will then need to be careful about auto-vivification. Also if you plan on doing a lot of removing from the set you may want to compare the performance of undef-ing a hash entry versus delete-ing it. | [reply] |
Re: Effecicncy of key-only hash
by brycen (Monk) on Aug 25, 2008 at 16:53 UTC
|
Thanks for all the great answers, especially those with code samples and benchmarks! It looks like the syntax and space efficiency winner is Set::Light, though not by much (it uses a C module to point all the set members to a single object).
Remember I'm in mod_perl, so construction/deconstruction time is less important than memory footprint and lookup time.
#! /usr/bin/perl -w
use strict;
use Devel::Size qw(total_size);
use Set::Light;
my $lookup = "shave";
my %hash1 = (
shave => '',
the => '',
very => '',
modern => '',
way => '',
);
my %hash2 = (
shave => 1,
the => 1,
very => 1,
modern => 1,
way => 1,
);
my %hash3 = (
shave => undef,
the => undef,
very => undef,
modern => undef,
way => undef,
);
print "Hash3 " . ((exists $hash3{$lookup})?"contains":"does not contai
+n") . " $lookup\n";
my $foo = undef;
my %hash4 = (
shave => \$foo,
the => \$foo,
very => \$foo,
modern => \$foo,
way => \$foo,
);
my $set = Set::Light->new( qw/shave the very modern way/ );
print "Set " . (($set->has($lookup))?"contains":"does not contain") .
+" $lookup\n";
print "Size 1: " . total_size(\%hash1) . "\n";
print "Size 2: " . total_size(\%hash2) . "\n";
print "Size 3: " . total_size(\%hash3) . "\n";
print "Size 4: " . total_size(\%hash4) . "\n";
print "Size S: " . total_size(\$set) . "\n";
__END__
Hash3 contains shave
Set contains shave
Size 1: 363
Size 2: 303
Size 3: 283
Size 4: 315
Size S: 251
| [reply] [d/l] |