UPDATE 2020-01-10: Actually, it's not. See subthread starting at 11111117.
----
I decided to run some benchmarking on hash exists after some code profiling showed a reasonable amount of time spent on lines with next if exists $hash{$key}.
This is largely in the context of code structured like the (very contrived) example below which uses the common idiom of skipping slow code if it has already been done or is not needed based on a tracking hash.
my %done; my @data = (1..100); for (1..100) { push @data, int (rand() * 100); } for my $item (@data) { next if exists $done{$item}; # do something time consuming # ... $done{$item}++; }
The code below tries combinations of exists and value checking. Assignment to variables is used to avoid "Useless use of hash element in void context" warnings, and the assignment to globals is to get a sense of how much the timings are related to bookkeeping of lexicals. I could disable warnings but it's the relative timing differences that are useful here, not the absolute times.
use Benchmark qw {:all}; use 5.016; my %hash; # set up the hash for (1001..2000) { $hash{$_}++; } # two keys we use below our $key1 = 1001; our $key2 = 1002; # hash key 1 is SV $hash{$key1} = 1; # hash key 2 is RV $hash{$key2} = {1..10}; # assign to global as a baseline our $xx_global; # keys are short so the timing table is not too wide # char 1: e = exists check, # v = value check # chars 2,3: ck = constant key, # vk = variable key # chars 4,5: sv = key contains scalar value, # rv = key contains reference # char 6: l = assign to lexical, # g = assign to global # thus # ecksvl means "exists check # using constant key # containing a scalar value, # assigned to lexical" # (the value is clearly redundant for an exists check, # but is retained for completeness) my %checks = ( ecksvl => 'my $x = exists $hash{1001}', evksvl => 'my $x = exists $hash{$key1}', vcksvl => 'my $x = $hash{1001}', vvksvl => 'my $x = $hash{$key1}', evksvg => '$xx_global = exists $hash{$key1}', vvksvg => '$xx_global = $hash{$key1}', eckrvl => 'my $x = exists $hash{1002}', evkrvl => 'my $x = exists $hash{$key2}', vckrvl => 'my $x = $hash{1002}', vvkrvl => 'my $x = $hash{$key2}', evkrvg => '$xx_global = exists $hash{$key2}', vvkrvg => '$xx_global = $hash{$key2}', ); cmpthese ( -3, \%checks );
Code was run using Strawberry perl 5.28.0, and the results are given in the table below (see code for key explanation).
The main take home is that the value checks (v prefix) are all faster than the exists checks (e prefix). Assigning to global is faster, presumably because there is less bookkeeping involved, but it will be rare that one would use such a construct anyway.
Rate evksvl evkrvl ecksvl eckrvl evksvg evkrvg vvksvl vvk +rvl vckrvl vcksvl vvksvg vvkrvg evksvl 10733145/s -- -5% -7% -12% -15% -18% -29% - +31% -32% -33% -41% -48% evkrvl 11290643/s 5% -- -2% -7% -10% -14% -25% - +27% -28% -29% -38% -45% ecksvl 11570664/s 8% 2% -- -5% -8% -12% -23% - +25% -27% -27% -36% -44% eckrvl 12176232/s 13% 8% 5% -- -3% -8% -19% - +21% -23% -23% -33% -41% evksvg 12572221/s 17% 11% 9% 3% -- -5% -17% - +19% -20% -21% -31% -39% evkrvg 13168623/s 23% 17% 14% 8% 5% -- -13% - +15% -17% -17% -28% -36% vvksvl 15082826/s 41% 34% 30% 24% 20% 15% -- +-2% -4% -5% -17% -27% vvkrvl 15461840/s 44% 37% 34% 27% 23% 17% 3% + -- -2% -3% -15% -25% vckrvl 15777625/s 47% 40% 36% 30% 25% 20% 5% + 2% -- -1% -13% -23% vcksvl 15909705/s 48% 41% 38% 31% 27% 21% 5% + 3% 1% -- -13% -23% vvksvg 18207860/s 70% 61% 57% 50% 45% 38% 21% +18% 15% 14% -- -12% vvkrvg 20580512/s 92% 82% 78% 69% 64% 56% 36% +33% 30% 29% 13% --
So why is it that exists is slower than checking the value? My starting assumption was that exists should be faster, as getting a value requires checking that it exists first. However, looking that the source code, most of the hash key and value calls are passed through the same function, hv_common. So far as I can tell from reading the code, and based on my limited comprehension of the details, hv_common prioritises getting values over checking key existence and value assignment.
So does this all matter and should code that uses exists $hash{$key} be changed to use $hash{$key}? Given that even the slowest of the benchmark snippets is running more than 10,000,000 per second, it does not matter at all for most use cases. One would need to be running hundreds of millions of calls for such a change to start to make a meaningful difference, and some would quite reasonably argue that billions of calls are needed.
Maybe the perl source code could be optimised so exists is not slower, but whether this justifies any additional maintenance burden is not something I can answer.
|
---|