http://qs321.pair.com?node_id=1081973

Random_Walk has asked for the wisdom of the Perl Monks concerning the following question:

Which construct is preferred, and why?

# I am looking for hosts that appear in all of a set of SQL tables. # I keep a table count and increment the count for a host for each tab +le in # which it occurs. # Then I clean out those that were not in all tables. # Faked data my %hosts = ( a => 3, b => 3, c => 4, d => 3, e => 2, f => 3, g => 3, ); my $table_count = 3; # This: do {delete $hosts{$_} unless $hosts{$_} == $table_count} for keys %hos +ts; # or This: map {delete $hosts{$_} unless $hosts{$_} == $table_count} keys %hosts;
Or is there a better way to do it?

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

Replies are listed 'Best First'.
Re: map {} list or do {} for list?
by LanX (Saint) on Apr 11, 2014 at 14:29 UTC
    TIMTOWTDI, both are fine.

    IIRC old versions of Perl had limitations with map in void context, but that's history.

    Some might argue that this is more readable

    for my $key ( keys %hosts ) { if ( $hosts{$key} != $table_count ) { delete $hosts{$key}; } }

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update

    inverted unless to if

    update
    this works
    while ( my ($key,$value) = each %hosts) { delete $hosts{$key} if $value != $table_count; }

    but I'm not sure about side effects!

    edit

    aha -> each

    If you add or delete elements of a hash while you’re iterating over it, you may get entries skipped or duplicated, so don’t. Exception: It is always safe to delete the item most recently returned by "each()", which means that the following code will work:

    while (($key, $value) = each %hash) { print $key, "\n"; delete $hash{$key}; # This is safe }
        No I didn't¹ ..

        But problem ...

        a) seems to be introduced by the new randomization, so it's a bug

        and

        b) is not new, you can't nest each %hash cause it has global side-effects². In my case it's only local to the loop.

        But yeah, I would love to have something like hashgrep and hashmap in core ...

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        ¹) maybe I should switch spending time from PM to blogs.perl.org to follow Reini, Aristoteles and Damian...

        ²) maybe better phrased as "bound to the the hash ref". I can imagine situations where passing around the hashref and iterating over it in different places is useful.

      Hi Rolf,

      just wanted to post this snippet:

      for my $host (keys %hosts) { next if $hosts{$host} == $table_count; delete $hosts{$host}; }

      but checked the incomming answers before posting redundant things. In this case I had to smile because I do agree with your "taste" (++).

      UPDATE: Had an logic error in there: changed < to ==.

      McA

Re: map {} list or do {} for list?
by hdb (Monsignor) on Apr 11, 2014 at 14:50 UTC

    Or with a slice:

    delete @hosts{ grep { $hosts{$_} != $table_count } keys %hosts };
Re: map {} list or do {} for list?
by SuicideJunkie (Vicar) on Apr 11, 2014 at 14:30 UTC

    map returns a list. It should be used when you want the list being generated as output.

    do{...} for should be used when you just want to do things in a loop.

    Since you have your statement in void context, you obviously don't want any output. Thus, the for loop is what you want.

      That was my feeling about map too. But I have seen it used void in a few bits of code recently, and wondered if there was any solid argument for or against, other than it looks wrong :)

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!

        Well, it certainly still works, but its misleading. You're building a big list of results and then immediately throwing it away.

Re: map {} list or do {} for list?
by choroba (Cardinal) on Apr 11, 2014 at 14:35 UTC
    Just to add to TIMTOWTDI:
    $hosts{$_} == $table_count or delete $hosts{$_} for keys %hosts;

    Update: Fixed, used and instead of or.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      This deletes the entries the OP wanted to keep...

        Thanks, fixed.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: map {} list or do {} for list?
by tobyink (Canon) on Apr 11, 2014 at 21:11 UTC

    If I wanted it to be fast and concise, I'd probably go with:

    $hosts{$_}==$table_count or delete($hosts{$_}) for keys %hosts;
    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
Re: map {} list or do {} for list?
by jethro (Monsignor) on Apr 11, 2014 at 14:30 UTC

    You know the perl motto? "There is more than one way to do it"

Re: map {} list or do {} for list?
by Laurent_R (Canon) on Apr 11, 2014 at 17:41 UTC
    Or copying the hash onto itself, so to speak:
    %hosts = map { $hosts{$_} == $table_count ? ($_, $hosts{$_}) : ()} key +s %hosts;
Re: map {} list or do {} for list? - Benchmarks
by Random_Walk (Prior) on Apr 12, 2014 at 12:32 UTC

    Thanks to all who contrbuted to this thread. I have put together a quick benchmark. Unless I made an error in my code, map is the clear winner on performance grounds, as well as confusing any newbie that has to look at my code grounds ;-)

    Rate or delete do copy slice map or delete 1620746/s -- -1% -50% -71% -78% do 1642036/s 1% -- -49% -70% -78% copy 3215434/s 98% 96% -- -42% -56% slice 5555556/s 243% 238% 73% -- -24% map 7299270/s 350% 345% 127% 31% --
    Here is the code:
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); # create a hash with a few values deviating my $t_count = 3; my %hosts = map { $_ => $t_count - ( rand 50 > 49 ? 1 : 0 ) } (1 .. 1 +000); my $count = 10_000_000; cmpthese($count, { ' do' => ' do {delete $hosts{$_} unless $hosts{$_} == $t_co +unt} for keys %hosts ', ' map' => ' map {delete $hosts{$_} unless $hosts{$_} == $t_co +unt} keys %hosts ', 'or delete' => ' $hosts{$_} == $t_count or delete $hosts{$_} for k +eys %hosts ', ' slice' => ' delete @hosts{ grep { $hosts{$_} != $t_count } ke +ys %hosts } ', ' copy' => ' %hosts = map { $hosts{$_} == $t_count ? ($_, $hos +ts{$_}) : ()} keys %hosts ', });

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!

      When you see operations per second that high, you should ask yourself if any work is being done. In fact, none really is, and your benchmark is rendered totally unreliable as a result. First, you have scoping issues. And even if you fix those, you remain with the issue that choroba identified; that the first benchmark iteration deletes from the master copy of the hash, leaving remaining iterations with less work to do.

      Here's a version that codes around the scoping issues that evaled code creates, and that makes a copy of %hosts on each iteration. That copy costs time, but it costs the same amount of time for each snippet.

      use Benchmark qw(cmpthese); our $t_count = 3; our %hosts = map { $_ => $t_count - ( rand 50 > 49 ? 1 : 0 ) } (1 .. +1000); my $count = 10000; cmpthese($count, { do => 'my %t = %main::hosts; do { delete $t{$_} unless $t{$_} == +$main::t_count } for keys %t;', map => 'my %t = %main::hosts; map { delete $t{$_} unless $t{$_} == + $main::t_count } keys %t;', or => 'my %t = %main::hosts; $t{$_} == $main::t_count or delete $ +t{$_} for keys %t;', slice => 'my %t = %main::hosts; delete @t{ grep { $t{$_} != $main::t +_count } keys %t };', copy => 'my %t = %main::hosts; %t = map { $t{$_} == $main::t_count +? ($_, $t{$_}) : ()} keys %t;', nada => 'my %t = %main::hosts;' });

      And here's the output I get:

      Rate copy map or do slice nada copy 1372/s -- -57% -57% -57% -58% -78% map 3175/s 131% -- -2% -2% -2% -50% or 3226/s 135% 2% -- 0% -0% -49% do 3226/s 135% 2% 0% -- -0% -49% slice 3236/s 136% 2% 0% 0% -- -49% nada 6289/s 358% 98% 95% 95% 94% --

      "nada" is just there to identify how much time we're wasting making a fresh copy of the hash on each iteration.

      As you can see, all of the approaches except for the copy one are so close that they're probably within the margin of error. Use the one that seems most legible, and if there's a risk that it won't be comprehended, encapsulate by wrapping it in a well-named subroutine.


      Dave

      After running the first benchmarked subroutine, your %hosts hash gets smaller. The deleting never happens again.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ