Re^3: Hash versus chain of elsifs

Replies are listed 'Best First'.
Re^4: Hash versus chain of elsifs by mldvx4 (Friar) on Nov 22, 2021 at 10:24 UTC
Here is some sample code: `package JunnkSites 0.03; use parent qw(Exporter); our @EXPORT = qw(KnownJunkSite); sub KnownJunkSite { my ($a) = (@_); my %junksites = ( "bollyinside.com", "www.bollyinside.com", ... "worldtrademarkreview.com", "www.worldtrademarkreview.com", ); if(exists($junksites{$a})) { $a = 1; } else { $a = 0; } return($a); }` [download] Any other performance and style tips or pointers welcome.	[reply] [d/l]
Re^5: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 11:22 UTC
Some feedback on your posted code: Always post a SSCCE Your hash is not correct (which would have been picked up with a SSCCE). Always Use strict and warnings Don't use `$a` or `$b` as your variable name because they have special meanings in Perl. DRY (you have repeated `$a` unnecessarily in your sample code). Anyway, here is a very simple example of how I would go about it. `use strict; use warnings; # Using block lexical scope to data-hide %junksites { my %junksites = ( 'bollyinside.com' => 1, 'www.bollyinside.com' => 1, 'worldtrademarkreview.com' => 1, 'www.worldtrademarkreview.com' => 1, ); sub KnownJunkSite { my $val = shift; return exists $junksites{$val}; } } for my $v ('bollyinside.com', 'fred', 'www.worldtrademarkreview.com') +{ print "$v: ", KnownJunkSite($v) ? "found\n" : "not found\n"; }` [download] Running this little test program produces: `bollyinside.com: found fred: not found www.worldtrademarkreview.com: found` [download] Update: Should you write a Procedural Module or an OO Module or just use a Hash? In your case, if I wrote a module, I'd use OO. See also: Re: Procedural vs OOP modules (this is how I decide: OO or Procedural?) High Performance Game of Life (package Organism in "Perl Solution" section is a simple example of a Perl OO module based on a hash) ... though I'd also consider not writing a module at all, instead just using a hash/hashref directly, as analysed below in my reply to this reply.	[reply] [d/l] [select]
Re^6: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 23, 2021 at 11:34 UTC
If performance is an issue, you might consider eliminating the lookup function call overhead (and the arguments about whether your module should use state or our or lexically scoped my ;-) simply by not writing a function at all! Instead performing your hash lookups directly. To test this idea, I wrote the following little benchmark: use strict; use warnings; use Benchmark qw(timethese); # Test hash my %junksites = ( 'bollyinside.com' => 1, 'www.bollyinside.com' => 1, 'worldtrademarkreview.com' => 1, 'www.worldtrademarkreview.com' => 1, ); sub KnownJunkSite { my $val = shift; return $junksites{$val}; } my @testlist = ( 'bollyinside.com', 'www.bollyinside.com', 'fred', 'ww +w.worldtrademarkreview.com' ) x 1000; sub fn_lookup { for my $v (@testlist) { my $t = KnownJunkSite($v); } } sub hash_lookup { for my $v (@testlist) { my $t = $junksites{$v}; } } my $href = \%junksites; sub hashref_lookup { for my $v (@testlist) { my $t = $href->{$v}; } } timethese 50_000, { Fn => sub { fn_lookup() }, Hash => sub { hash_lookup() }, HashRef => sub { hashref_lookup() }, }; [download] Running the little benchmark program above on my laptop displayed: `Benchmark: timing 50000 iterations of Fn, Hash, HashRef... Fn: 28 wallclock secs (27.56 usr + 0.00 sys = 27.56 CPU) @ 181 +4.03/s Hash: 11 wallclock secs (11.62 usr + 0.00 sys = 11.62 CPU) @ 430 +1.08/s HashRef: 12 wallclock secs (12.02 usr + 0.00 sys = 12.02 CPU) @ 416 +1.12/s` [download] Note that in the sample code above, to make a direct hash lookup more pleasing to the eye, I eliminated the call to exists simply by ensuring all keys have the true value `1`. No surprise that using the hash directly is a lot faster than calling a function every time you do a lookup. Also of interest is that a hash lookup is only marginally faster than a hash_ref lookup. Based on this benchmark, rather than agonizing over whether your function should use block lexical scope or the Perl 5.10 state feature or an our variable, you might instead choose not to use a function at all! That is, perform the lookup directly via a hash, rather than a function call. Note that using a hashref, rather than a hash, gives you the flexibility to call your code with many different hashes, at a miniscule performance cost.	[reply] [d/l] [select]
Re^5: Hash versus chain of elsifs by kcott (Archbishop) on Nov 22, 2021 at 11:45 UTC
G'day mldvx4, I agree with others that a hash is likely to be more efficient than a chain of `elsif`s. Having said that, as a general rule-of-thumb, you should Benchmark: Perl may have already optimised what you're trying to do (so you'd be both wasting your time and bloating your code); different algorithms may be more or less efficient depending on the data (e.g. number of strings, individual length of strings, total size of data); and so on. Don't guess; benchmark. "Any other performance and style tips or pointers welcome." When asked for sample code; provide code that we can run and output that shows it runs correctly. If you can't get your code to produce the desired output, indicate what you expected and show what you actually got (including all error and warning messages verbatim between `<code>...</code>` tags). I suggest you read "SSCCE". Your package name should probably only contain one '`n`', i.e. `JunkSites`. Always put `use strict;` and `use warnings;` at the top of your code. Don't use `$a` or `$b` as general variables. They're special. See "$a". Use state, instead of my, to declare persistent variables. Note that `state` was introduced in Perl v5.10. The code you show for `sub KnownJunkSite {...}` looks very wrong. See my example code below for what I think is closer to what you're after. Example code: `#!/usr/bin/env perl use 5.010; use strict; use warnings; my @test_sites = qw{x.com y.com www.z.com www.y.com}; check_junk($_) for @test_sites; sub KnownJunkSite { my ($key) = @_; state $is_junksite = { map +($_, 1), qw{ x.com www.x.com z.com www.z.com } }; return exists $is_junksite->{$key} ? 1 : 0; } sub check_junk { my ($key) = @_; say "$key: ", KnownJunkSite($key); }` [download] Output: `x.com: 1 y.com: 0 www.z.com: 1 www.y.com: 0` [download] — Ken	[reply] [d/l] [select]
Re^5: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 10:39 UTC
You can make your `%junksites` variable persist either by moving it outside the sub (in a block lexical scope) or by making it a state variable. For a simple example of these two approaches, see the `%rtoa` variable at: Rosetta PGA-TRAM (uses Perl simple lexical scoping) Re: Rosetta PGA-TRAM (uses Perl state variable)	[reply] [d/l] [select]
Re^6: Hash versus chain of elsifs by NERDVANA (Deacon) on Nov 22, 2021 at 11:04 UTC
Or better, declare it outside the function as "our %junksites". That way someone can make changes to it if needed.	[reply]
Re^7: Hash versus chain of elsifs by BillKSmith (Monsignor) on Nov 23, 2021 at 02:32 UTC
Re^8: Hash versus chain of elsifs by NERDVANA (Deacon) on Nov 25, 2021 at 07:20 UTC
Re^5: Hash versus chain of elsifs by choroba (Cardinal) on Nov 22, 2021 at 12:33 UTC
`my %junksites = ( "bollyinside.com", "www.bollyinside.com", ... "worldtrademarkreview.com", "www.worldtrademarkreview.com", );` [download] Note that hash needs key and value pairs, so you're storing only the site names without www as keys; the www. prefixed ones are stored as values. Probably not what you want. The fast way how to initialize the keys is `my %junksites; @junksites{qw{ bollyinside.com www.bollyinside.com ... worldtrademarkreview.com www.worldtrademarkreview.com }} = ();` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]


P is for Practical
	PerlMonks