jaa has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I have a function:
sub words {
return [
map {
{ # substitute these
'DUTCH' => 'NETHERLANDS',
'GERMANY' => 'DEUTSCHLAND',
'AUST.' => 'AUSTRALIA',
}->{$_}
or
$_
}
grep {
!{ # skip these
'BANK' => 1,
'CORP' => 1,
'GOVERNMENT' => 1,
'GOVT' => 1,
'LIMITED' => 1,
'LTD' => 1,
'NPV' => 1,
'COM' => 1,
}->{$_}
} split /\s+/, shift ];
}
My question is: Will Perl optimise this? specifically, the creation of the two anon hashes? Or am I better to programatically allocate these into hashes with named storage?
Re: Does Perl do constant anonymous hash creation optimisation?
by BrowserUk (Patriarch) on Jul 08, 2006 at 10:06 UTC
|
Even if Perl did detect the constant nature of the inline hashes, which I seriously doubt it could, something like this is probably clearer
use constant {
SUBSTITUTES => { # substitute these
'DUTCH' => 'NETHERLANDS',
'GERMANY' => 'DEUTSCHLAND',
'AUST.' => 'AUSTRALIA',
},
SKIPWORDS => { # skip these
'BANK' => 1,
'CORP' => 1,
'GOVERNMENT' => 1,
'GOVT' => 1,
'LIMITED' => 1,
'LTD' => 1,
'NPV' => 1,
'COM' => 1,
},
};
sub words {
return [
map {
SUBSTITUTES->{$_} or $_
} grep {
!SKIPWORDS->{$_}
} split /\s+/, shift ];
}
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
Thanks for that - have never really used
use constant {...};
we usually do this at the top of the package
our $SKIPWORDS = {};
so I wonder if there is an overhead for use constant?
The thing I like about the inline/anon hashes is that all the logic and data is clearly seen in one place.
In this specific example, the package is 5000+ lines long, and the words() function because it starts with w... will end up near the end of the file - 5000 lines away from the skip and substitute words.
Will probably go with the use contant if there is no overhead - this func will get called a lot in an inner loop with anywhere up to a million times per run.
Thanks again, for the pointers,
Jeff
| [reply] [d/l] [select] |
|
Using constant is 3 times faster than your inline construct.
In this specific example, the package is 5000+ lines long, and the words() function because it starts with w... will end up near the end of the file - 5000 lines away from the skip and substitute words.
With that much data, I'd definitely be putting it in a separate file on it's own. If you do not want to go to the bother of wrapping it up as a module, you could just put the constant hash definition into a file of it's own and use the simple do 'wordshash.pl'; just before the associated words() functions, though it would probably be better to wrap the function and data into a module and do something like use My::Words qw[ words ]; in the main code.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
use constant SKIPWORDS => {...};
our $SKIPWORDS = {};
So a big thumbs up for use constant()!
Thanks for the help!
| [reply] [d/l] |
Re: Does Perl do constant anonymous hash creation optimisation?
by Thilosophy (Curate) on Jul 08, 2006 at 09:15 UTC
|
The following (flawed, see below) experiment (prints out the memory location of the static hash) seems to suggest that Perl indeed creates the hash only once:
#!/usr/bin/perl
sub static_hash {
print { one => 1, two => 2 };
print $/;
}
static_hash;
static_hash;
static_hash;
planz$ perl /tmp/static.pl
HASH(0x1801380)
HASH(0x1801380)
HASH(0x1801380)
Update:
Okay, forget about that, this just shows that a hash gets created in the same memory location. It could still be a new hash every time. In fact, changing the experiment to
use a fresh hash yields exactly the same output:
#!/usr/bin/perl
sub static_hash {
print { one => $_[0], two => time };
print $/;
}
static_hash(8);
static_hash(9);
static_hash(10);
| [reply] [d/l] [select] |
|
It definitely does -not- get optimized.
perl -e 'my %bz = (x => 2, y=> 3, z=>4); sub baz { my $x = \%bz }; sub foo { my $x = { x => 1, y => 2, z=>3 } } sub bar { my $x = {x=>shift, y=>shift, z=>shift} } use Benchmark; timethese(-1,{foo=>\&foo,bar=>\&bar,baz=>\&baz});'
Benchmark: running bar, baz, foo for at least 1 CPU seconds...
bar: 2 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 607349.51/s (n=625570)
baz: 1 wallclock secs ( 1.07 usr + -0.01 sys = 1.06 CPU) @ 3462279.25/s (n=3670016)
foo: 2 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 618264.15/s (n=655360)
Note the decimal point.
| [reply] |
Re: Does Perl do constant anonymous hash creation optimisation?
by muba (Priest) on Jul 08, 2006 at 16:03 UTC
|
This is not as much about the technical aspect of your project, but I'll tell you anyway. The Dutch name for our country is Nederland. | [reply] |
|
'AUST.' => 'OZ',
'ENZED' => 'KIWI',
'ET' => 'CETERA'
| [reply] [d/l] |
|
|