http://qs321.pair.com?node_id=997326

marquezc329 has asked for the wisdom of the Perl Monks concerning the following question:

I'm working through Hall and Schwartz's Effective Perl Programming: Writing Better Programs with Perl and had a small question about the passage below:
To a certain extent, idiom and style overlap. Some idioms, like print sort <>, are inarguable, but there are certainly gray areas:
foreach $key (sort keys %h) { print "$key: $h{$key}\n"; }
Print key-value pairs from %h one per line.
print map "$_: $h{$_}\n", sort keys %h;
Another way to print key-value pairs. The first example above is very plain Perl. It is efficient and readable and uses only basic features of the language. The second example is shorter and, some might argue, has a higher "cool factor" because it uses the nifty map operator and a list context in place of the mundane foreach loop in the first. However, you should consider yourself and your potential audience before leaving code like the second example for posterity, because it is definitely more obscure (but not that obscure) and might even be less efficient.
In my reading on perlmonks I've definitely noticed a bias towards the use of map and grep functions, and so I've been taking strides towards becoming more comfortable implementing them in my code (basic as it may be). Even so, I would like to understand in what ways the map version of the code above would be less efficient than the more verbose foreach loop? Is efficiency being sacrificed for the versatility/style of map? Is there a difference in efficiency at all? I assumed the two methods above to be interchangeable, but should I be being more selective of the situations where I use map?
The above segment is taken from Effective Perl Programming: Writing Better Programs with Perl By Joseph N. Hall, Randal L. Schwartz Publisher : Addison Wesley Pub Date : December 30, 1997 ISBN : 0-201-41975-0 and in no way my own.

Replies are listed 'Best First'.
Re: Efficiency of map vs. more verbose basic/fundamental code
by BrowserUk (Patriarch) on Oct 04, 2012 at 22:49 UTC
    Even so, I would like to understand in what ways the map version of the code above would be less efficient than the more verbose foreach loop?

    You see the print part of the statement? That means the statement is going to take at least milliseconds regardless of what comes after it, so efficiency -- in terms of time -- is a red-herring. And if the hash is large, the sort is going to dominate the time required.

    What the authors are probably clumsily trying to allude to it the memory (in)efficiency if the hash happens to contain a large number of keys. The one-line statement uses a lot of memory because it creates several intermediate stack based lists in order to process. If there are 1 million keys in %h, then this is what that looks like internally:

    print <1e6 key/value/newline> map "$_: $h{$_}\n", <1e6 ordered keys> s +ort <1e6 unordered keys>;

    You can see it can be memory hungry. Memory usage increases by 163 MB in order to execute that statement.

    However, if the hash is (or could be) big enough for that to be a serious concern, then the book's alternative construction is almost as bad:

    foreach $key ( <1e6 sorted items> sort <1e6 unordered items> keys %h) +{ print "$key: $h{$key}\n"; }

    Here, the memory usage increases by 56 MB to process the loop. Better, but still pretty wasteful, but that cannot be avoided if you use sort.

    However, you can avoid the need for spreading the single semantic operation -- display the hash pairs ordered by key -- over three lines, whilst avoiding the increased memory usage of using map, by doing it this way:

    print "$_ : $h{ $_ }\n" for sort keys %h;

    Clean, concise and a single line representing a single semantic operation; but then there are some books that will tell you that is "difficult (for beginners) to read".

    But in the end, unless you are routinely dealing with huge datasets, all three are much of a muchness and it really comes down to what you personally prefer. Trying to guess what the next guy to come along will find most readable is a mug's game. If there are 3 ways to do something, whichever you choose you'll be wrong 2/3rds of the time.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      I find statement modifiers to be the clearest way to write code, but maybe I'm just used to them. Still, I'm having trouble seeing why turning a standard for loop into a statement modifier cuts down on the memory usage. Why are examples A and B below different? Is it that the parentheses cause a list to be created in memory? Would example C be as bad as A?

      # A for ( sort keys %h ){ dostuffwith($_); } # B dostuffwith($_) for sort keys %h; # C dostuffwith($_) for (sort keys %h);

      I guess I need to learn to benchmark memory usage.

      Aaron B.
      Available for small or large Perl jobs; see my home node.

        Still, I'm having trouble seeing why turning a standard for loop into a statement modifier cuts down on the memory usage.

        I didn't suggest it did.

        Only that it reduced the memory requirement compared to the map variant; and sourcecode/mindspace compared to the standard for loop.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: Efficiency of map vs. more verbose basic/fundamental code
by eyepopslikeamosquito (Archbishop) on Oct 04, 2012 at 22:47 UTC

    The later edition of Effective Perl Programming has item 20, "Use foreach, map and grep as appropriate", which gives an excellent summary of when to use foreach, map and grep:

    • Use foreach to iterate read-only over each element of a list
    • Use map to create a list based on the contents of another list
    • Use foreach to modify elements of a list
    • Use grep to select elements in a list

    Hall, McAdams and foy further caution against modifying a list via map:

    "For efficiency, $_ is actually an alias for the current element in the iteration. If you modify $_ within the transform expression of a map, you modify the input data. This is generally considered to be bad style, and -- who knows? -- you may even wind up confusing yourself this way. If you want to modify the contents of a list, use foreach."

    I'm pretty sure the earlier edition you are reading has a similar item. If you read this item, you should not be confused about which one to use. I suggest you focus on clarity and only bother benchmarking if performance is really critical.

Re: Efficiency of map vs. more verbose basic/fundamental code
by Anonymous Monk on Oct 04, 2012 at 21:06 UTC

    Even so, I would like to understand in what ways the map version of the code above would be less efficient than the more verbose foreach loop

    There are only two ways it could be, it uses more memory, or it takes longer, you can Benchmark to find out

    But I suspect there isn't much difference in either today, since that statement was published in 1998

      Thank you for your quick reply and pointing me in the direction of Benchmark. Observant me didn't even think of time passed since the statement was made, but I figured since I asked the question I may as well go ahead and learn how to use Benchmark and get the answer. For anybody interested in the results:

      For the sake of simplicity I used the code from the passage broken into subroutines to be called by Benchmark:
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); my %h; @h{'A'..'Z','a'..'z'} = 1..52; sub verbose { my $hash = shift; foreach my $key (sort keys %$hash) { print "$key: $hash->{$key}\n"; } } sub idiom { my $hash = shift; print map "$_: $hash->{$_}\n", sort keys %$hash; } timethese(100000, { 'Verbose' => 'verbose(\%h)', 'Idiom' => 'idiom(\%h)', }); timethese(1000000, { 'Verbose' => 'verbose(\%h)', 'Idiom' => 'idiom(\%h)', }); timethese(10000000, { 'Verbose' => 'verbose(\%h)', 'Idiom' => 'idiom(\%h)', });

      RESULTS:

      Benchmark: timing 100000 iterations of Idiom, Verbose...
      Idiom: 0 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU) @ 1250000.00/s (n=100000)
      (warning: too few iterations for a reliable count)
      Verbose: 0 wallclock secs ( 0.12 usr + 0.00 sys = 0.12 CPU) @ 833333.33/s (n=100000)
      (warning: too few iterations for a reliable count)

      Benchmark: timing 1000000 iterations of Idiom, Verbose...
      Idiom: 1 wallclock secs ( 0.76 usr + 0.00 sys = 0.76 CPU) @ 1315789.47/s (n=1000000)
      Verbose: 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 892857.14/s (n=1000000)

      Benchmark: timing 10000000 iterations of Idiom, Verbose...
      Idiom: 8 wallclock secs ( 7.52 usr + 0.00 sys = 7.52 CPU) @ 1329787.23/s (n=10000000)
      Verbose: 12 wallclock secs (11.76 usr + 0.00 sys = 11.76 CPU) @ 850340.14/s (n=10000000)


      So at least for this particular piece of code map proves to be faster. Thanks again for helping me answer my own question. As always, I'm still a beginner so if this is misguided/messy code or i tested this wrong please let me know.
        Edit: Just so others don't stop here, this benchmark doesn't actually test the functions it claims to. Read the following messages for a more accurate benchmark of map vs foreach. Hint: foreach wins in the end

        Hmmm, well now I'm a little surprised. I was going to argue that if you're going to benchmark then you must compare apples to apples. In your example, 'verbose' uses an extra variable 'my $key' which the mapping version does not. I thought that might be a factor, so I made a new function 'verbose2' which uses $_ like the mapping function, expecting that it might be on par (or at least faster than 'verbose').

        But using $_ in place of $key was actually slower ???? So I tried one more time with 'verbose3' to re-write the function exactly the same as the mapping version, only using foreach instead. In my mind, verbose2 and verbose3 are exactly the same code and Perl should have interpreted them to be the same at run time, but again verbose3 was slower yet.

        Can a Perl innards expert explain why verbose2 is slower than verbose? Or why verbose3 is slower than verbose2?

        #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); my %h; @h{'A'..'Z','a'..'z'} = 1..52; sub verbose { my $hash = shift; foreach my $key (sort keys %$hash) { print "$key: $hash->{$key}\n"; } } sub verbose2 { my $hash = shift; foreach (sort keys %$hash) { print "$_: $hash->{$_}\n"; } } sub verbose3 { my $hash = shift; print "$_: $hash->{$_}\n" foreach sort keys %$hash; } sub idiom { my $hash = shift; print map "$_: $hash->{$_}\n", sort keys %$hash; } timethese(1000000, { 'Verbose' => 'verbose(\%h)', 'Verbose2' => 'verbose2(\%h)', 'Verbose3' => 'verbose3(\%h)', 'Idiom' => 'idiom(\%h)', }); timethese(10000000, { 'Verbose' => 'verbose(\%h)', 'Verbose2' => 'verbose2(\%h)', 'Verbose3' => 'verbose3(\%h)', 'Idiom' => 'idiom(\%h)', });
        Results: Benchmark: timing 1000000 iterations of Idiom, Verbose, Verbose2, Verb +ose3... Idiom: 1 wallclock secs ( 0.85 usr + 0.00 sys = 0.85 CPU) @ 11 +76470.59/s (n=1000000) Verbose: 1 wallclock secs ( 1.26 usr + 0.00 sys = 1.26 CPU) @ 79 +3650.79/s (n=1000000) Verbose2: 0 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 74 +6268.66/s (n=1000000) Verbose3: 2 wallclock secs ( 1.39 usr + 0.00 sys = 1.39 CPU) @ 71 +9424.46/s (n=1000000) Benchmark: timing 10000000 iterations of Idiom, Verbose, Verbose2, Ver +bose3... Idiom: 8 wallclock secs ( 8.57 usr + 0.00 sys = 8.57 CPU) @ 11 +66861.14/s (n=10000000) Verbose: 12 wallclock secs (12.92 usr + 0.00 sys = 12.92 CPU) @ 77 +3993.81/s (n=10000000) Verbose2: 13 wallclock secs (13.06 usr + 0.00 sys = 13.06 CPU) @ 76 +5696.78/s (n=10000000) Verbose3: 15 wallclock secs (13.90 usr + 0.01 sys = 13.91 CPU) @ 71 +8907.26/s (n=10000000)

        I tried your benchmark and I also see map is faster. When I changed subs to return concatenated results, for loop becomes faster. I wonder why?

        #!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; my %h = map {$_ => $_} 1 .. 10000; sub test1{ my $buff=''; foreach my $key (sort keys %h) { $buff.="$key: $h{$key}\n"; } return $buff; } sub test2{ return join('', map "$_: $h{$_}\n", sort keys %h); } print test1() eq test2() ? "same\n" : "not same\n"; my %tests = ( '01_for' => \&test1, '02_map' => \&test2, ); cmpthese( -10, #for 10 cpu secs \%tests ); __DATA__ Rate 02_map 01_for 02_map 33.9/s -- -16% 01_for 40.2/s 19% --
        update:
        I remember previous thread. "for loop" consumes lots of memory compared to while loop.

        hello, again

        I noticed $hash arg not passed to functions. You don't see outputs of print statement, do you? Would you try this?

        #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); my %h; @h{'A'..'Z','a'..'z'} = 1..52; sub verbose { #my $hash = shift; foreach my $key (sort keys %h) { print "$key: $h{$key}\n"; } } sub idiom { #my $hash = shift; print map "$_: $h{$_}\n", sort keys %h; } my %tests = ( '01_verbose' => \&verbose, '02_idiom' => \&idiom, ); cmpthese( -10, #for 10 cpu secs \%tests ); __DATA__ Rate 02_idiom 01_verbose 02_idiom 5341/s -- -4% 01_verbose 5577/s 4% --
        Personally, I like map.
        regards.

Re: Efficiency of map vs. more verbose basic/fundamental code
by marquezc329 (Scribe) on Oct 04, 2012 at 23:16 UTC
    I can see why benchmarking such a simple expression is trivial. I suppose there are several lessons learned here.
    • Thanks to BrowserUk it is clear that in this particular situation the memory conservative:

      print "$_ : $h{ $_ }\n" for sort keys %h;

      is the best choice taking both efficiency and elegance into consideration.

    • I learned how to benchmark my subroutines which will hopefully be of service as I learn to write more complex code

    • And possibly most importantly I need to update my study materials!!!

    Thanks again everybody!
Re: Efficiency of map vs. more verbose basic/fundamental code
by Marshall (Canon) on Oct 05, 2012 at 03:17 UTC
    I think that Version (2) and (1) do exactly the same thing.
    Version 2 is easier to understand. And I think that it is as least as fast if not faster. If we are benchmarking things, then we need to to "apples to apples".
    #!/usr/bin/perl -w use strict; use Data::Dumper; ## version 1.... my $num =0; my %h = map{$_, $num++}('A'..'Z','a'..'z'); print Dumper \%h; ## version 2... my %hash; foreach my $letter (('A'..'Z','a'..'z')) { $hash{$letter}++; } print Dumper \%hash; foreach (sort keys %h) { print "$_ OK\n" if $hash{$_}; }
      Can you explain what you mean by "apples to apples"? You're not the first to mention it. I honestly have never run a benchmark before and did it on the code that was in the passage only to test the statement the author made about the comparative efficiency.
      I was thinking that if both pieces of code did the exact same thing and I benchmarked them I could figure out if speed was being sacrificed by the use of the more brief coding technique. BrowserUk explained that the speed on this particular operation was trivial, but the sacrifice could potentially be in memory if we were to apply this procedure on a larger scale. So I thought I had my answer. Am I approaching this problem in the wrong way? In this case what are my apples and oranges and how would I go about doing a proper benchmark? I apologize if I'm asking too many questions.

        Two important things in running a benchmark are:

        1) Make sure your subroutines are doing the same thing with the same data. I made the mistake once of setting up the data outside the subroutine and then editing it in place, so that the second routine to run was working with different data than the first. Make sure each one starts with the same thing and has the same result.

        2) Avoid anything that adds overhead which could overshadow the time used by the function that you're actually trying to benchmark. If you're comparing for and map, just include them and as little else as possible. Especially avoid any system calls, IO like printing, etc. Print the results once to make sure your outputs match, but then remove those statements. A difference of nanoseconds won't show through the random noise if you're calling functions that take milliseconds.

        Aaron B.
        Available for small or large Perl jobs; see my home node.