Re^7: "exists $hash{key}" is slower than "$hash{key}"

Replies are listed 'Best First'.
Re^8: "exists $hash{key}" is slower than "$hash{key}" by LanX (Saint) on Jan 07, 2020 at 16:00 UTC
so the `->#NUMBER` at the end of the lines are GOTOs and the `-` labels indicate ignored op-codes? `2 <;> nextstate(main 74 -e:1) v:%,{,469764096 ->3 - <1> ex-exists vK/1 ->4 3 <+> multideref($h{"a"}) vK/EXISTS ->4` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l] [select]
Re^9: "exists $hash{key}" is slower than "$hash{key}" by dave_the_m (Monsignor) on Jan 07, 2020 at 17:03 UTC
Effectively yes. The ->#N is the pointer to the next op to be executed (op_next field), which is often not in the same order as the optree structure. Op nodes displayed as "ex-FOO" are ops that are no longer needed and have been converted into OP_NULL (with any attached data freed), but with their former type still recorded (mainly as a debugging aid). These OP_NULLs are usually removed from the execution path. Note that `-MO=Concise,-exec` shows ops in execution order, which often makes things clearer. Dave.	[reply] [d/l]
Re^8: "exists $hash{key}" is slower than "$hash{key}" by swl (Parson) on Jan 09, 2020 at 03:00 UTC
Thanks for the comments and clarifications. If I update the benchmark code to use the ternary operator, and only a global assignment, then the general pattern remains on windows, but less so on linux. Is anyone able to replicate these results? Read more... See the code (1371 Bytes) I ran the above code on both a linux box (perlbrew 5.30.0, CentOS 7) and a windows laptop (windows 10, Strawberry perl 5.30.0). Each was repeated four times. (In previous posts I used Strawberry 5.28.0, but the windows machine is the same). The relative differences on the linux machine are very small and the order changes between runs. One of the value calls is fastest across each of the runs, but not by much in absolute terms, and the exists call is second fastest for three of the four calls. On windows the value calls are always faster than the exists calls and the relative differences are much greater. Linux results: Read more... (1131 Bytes) Windows results: Read more... (1131 Bytes) And I should reiterate my point from the original post that the relative differences remain very small. If the difference is real, then one would have to be running a very large number of calls for the choice of idiom to make any meaningful difference. Addendum: After writing the above, I decided to run more replications on Windows to get a better sense of how consistent the results are on my machine, and get the results below for 30 replications. I could have simplified the benchmarks to one of each type, but have left the code as-is for simplicity. Of the 30 reps, 23 show both value calls being faster than either exists call. In only one case was exists fastest. Read more... (7 kB)	[reply] [d/l] [select]
Re^9: "exists $hash{key}" is slower than "$hash{key}" by dave_the_m (Monsignor) on Jan 09, 2020 at 08:17 UTC
At this point I think you're mainly measuring noise. You've also still got the bug whereby you populate the lexical %hash, but the benchmarks get run against the empty global %hash. By "noise", I mean a combination of timing noise, and (for lack of a better term) "compiler noise". How C code gets compiled can effect the alignment of machine code bytes across cache line boundaries, which means that different compilers can compile the same source code of the perl interpreter into different executables which have different instruction cache and branch prediction miss patterns. I have personally seen adding a line of code into a part of the perl interpreter which wasn't being executed (e.g. in dump.c) cause a 10% change in benchmark speed for a simple benchmark. These days I mostly benchmark the perl interpreter using a tool of mine (Porting/bench.pl) based on top of cachegrind, which profiles the execution run in terms of how many individual machine code instructions, branches etc it does. Under that, 'exists' takes slightly fewer instruction and data reads and writes and branches than a hash lookup. Dave.	[reply]
Re^10: "exists $hash{key}" is slower than "$hash{key}" by swl (Parson) on Jan 09, 2020 at 21:56 UTC
Thanks once again. Changing the `my %hash` line to `our %hash` makes the results much more variable, with exists being fastest about half the time across ten runs. If the Porting/bench.pl tool shows fewer instructions, branches etc. for `exists` then I'll take that as being a more authoritative test. For future readers, adding an explicit `use warnings;` to the script does not raise any warnings with the lexical hash in the benchmark code. Benchmark.pm does not use warnings and explicitly disables strict when evaling strings of benchmark code (see sub _doeval in the code). String-form benchmark code might avoid sub overheads, but more care needs to be taken with the code. For purposes of posterity, the compilers used to compile the perls I used were gcc 7.1.0 for Strawberry perl on Windows, and gcc 6.2.0 on linux.	[reply] [d/l] [select]
Re^9: "exists $hash{key}" is slower than "$hash{key}" by Anonymous Monk on Jan 09, 2020 at 09:15 UTC
is it noise? sub BenchIt { print "\n\n## $^O $]\n"; use Benchmark qw {:all}; our %hash; for (1001..2000) { $hash{$_}++; } our $key1 = 2000 - int rand 1001; our $key2 = 2000 - int rand 1001; $hash{$key1} = 1; $hash{$key2} = {1..10}; our $xx_global; cmpthese ( -2, { svExist => 'for(1..10_000){$xx_global = exists $hash{$ke +y1} ? 1 : 2}', svValue => 'for(1..10_000){$xx_global = $hash{$key1} ? 1 + : 2}', refExist => 'for(1..10_000){$xx_global = exists $hash{$ke +y2} ? 1 : 2} ', refValue => 'for(1..10_000){$xx_global = $hash{$key2} ? 1 + : 2}', } ); return; } [download] a few old perls Read more... (2 kB) laptops fluctuate :) Read more... (1023 Bytes)	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks