http://qs321.pair.com?node_id=1082078


in reply to map {} list or do {} for list?

Thanks to all who contrbuted to this thread. I have put together a quick benchmark. Unless I made an error in my code, map is the clear winner on performance grounds, as well as confusing any newbie that has to look at my code grounds ;-)

Rate or delete do copy slice map or delete 1620746/s -- -1% -50% -71% -78% do 1642036/s 1% -- -49% -70% -78% copy 3215434/s 98% 96% -- -42% -56% slice 5555556/s 243% 238% 73% -- -24% map 7299270/s 350% 345% 127% 31% --
Here is the code:
#!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); # create a hash with a few values deviating my $t_count = 3; my %hosts = map { $_ => $t_count - ( rand 50 > 49 ? 1 : 0 ) } (1 .. 1 +000); my $count = 10_000_000; cmpthese($count, { ' do' => ' do {delete $hosts{$_} unless $hosts{$_} == $t_co +unt} for keys %hosts ', ' map' => ' map {delete $hosts{$_} unless $hosts{$_} == $t_co +unt} keys %hosts ', 'or delete' => ' $hosts{$_} == $t_count or delete $hosts{$_} for k +eys %hosts ', ' slice' => ' delete @hosts{ grep { $hosts{$_} != $t_count } ke +ys %hosts } ', ' copy' => ' %hosts = map { $hosts{$_} == $t_count ? ($_, $hos +ts{$_}) : ()} keys %hosts ', });

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

Replies are listed 'Best First'.
Re^2: map {} list or do {} for list? - Benchmarks
by davido (Cardinal) on Apr 12, 2014 at 17:53 UTC

    When you see operations per second that high, you should ask yourself if any work is being done. In fact, none really is, and your benchmark is rendered totally unreliable as a result. First, you have scoping issues. And even if you fix those, you remain with the issue that choroba identified; that the first benchmark iteration deletes from the master copy of the hash, leaving remaining iterations with less work to do.

    Here's a version that codes around the scoping issues that evaled code creates, and that makes a copy of %hosts on each iteration. That copy costs time, but it costs the same amount of time for each snippet.

    use Benchmark qw(cmpthese); our $t_count = 3; our %hosts = map { $_ => $t_count - ( rand 50 > 49 ? 1 : 0 ) } (1 .. +1000); my $count = 10000; cmpthese($count, { do => 'my %t = %main::hosts; do { delete $t{$_} unless $t{$_} == +$main::t_count } for keys %t;', map => 'my %t = %main::hosts; map { delete $t{$_} unless $t{$_} == + $main::t_count } keys %t;', or => 'my %t = %main::hosts; $t{$_} == $main::t_count or delete $ +t{$_} for keys %t;', slice => 'my %t = %main::hosts; delete @t{ grep { $t{$_} != $main::t +_count } keys %t };', copy => 'my %t = %main::hosts; %t = map { $t{$_} == $main::t_count +? ($_, $t{$_}) : ()} keys %t;', nada => 'my %t = %main::hosts;' });

    And here's the output I get:

    Rate copy map or do slice nada copy 1372/s -- -57% -57% -57% -58% -78% map 3175/s 131% -- -2% -2% -2% -50% or 3226/s 135% 2% -- 0% -0% -49% do 3226/s 135% 2% 0% -- -0% -49% slice 3236/s 136% 2% 0% 0% -- -49% nada 6289/s 358% 98% 95% 95% 94% --

    "nada" is just there to identify how much time we're wasting making a fresh copy of the hash on each iteration.

    As you can see, all of the approaches except for the copy one are so close that they're probably within the margin of error. Use the one that seems most legible, and if there's a risk that it won't be comprehended, encapsulate by wrapping it in a well-named subroutine.


    Dave

Re^2: map {} list or do {} for list? - Benchmarks
by choroba (Cardinal) on Apr 12, 2014 at 17:03 UTC
    After running the first benchmarked subroutine, your %hosts hash gets smaller. The deleting never happens again.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ