http://qs321.pair.com?node_id=702071

dHarry has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I was contemplating over the 256 commandments from Damian’s "Perl Best Practices" when I encountered:

List Generation Use map instead of for when generating new lists from old.

Being more used to other programming languages I often use the non-Perl approach to get the job done. I would typically use a for and probably not even consider the map. So this was an eye-opener for me. I decided to do a little test to see how much the difference is between the two.

(FYI: Perl v5.8.8 built for MSWin32-x86-multi-thread running on a Dell INSPIRON 9400)

I use the example as mentioned by Damian and the Benchmark module to test:

use strict; use warnings; use Benchmark qw(:all); my @results; my $count = -5; # Populate list with 10 mio numbers for (my $i=0; $i<1000_000; $i++) { push @results, $i; } cmpthese ( $count, { for => "test_for;", map => "test_map;", } ); timethese($count, { for => "test_for;", map => "test_map;", } ); sub test_for { my @sqrt_results; for my $result (@results) { push @sqrt_results , sqrt($result); } } sub test_map { my @sqrt_results = map { sqrt $_ } @results; }

First the comparison:

$count=-1 (warning: too few iterations for a reliable count) Rate for map for 2.67/s -- -10% map 2.98/s 12% -- $count=-5 Rate map for map 3.05/s -- -16% for 3.61/s 18% -- $count=-10 Rate for map for 2.73/s -- -8% map 2.95/s 8% --

Hmmm, not really impressive this “gain” of using map over for?!

Next some timing:

$count=-1 Benchmark: running for, map for at least 1 CPU seconds... for: 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 3 +.46/s (n=4) map: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 3 +.37/s (n=4) $count=-5 Benchmark: running for, map for at least 5 CPU seconds... for: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 3 +.64/s (n=19) map: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 3 +.45/s (n=18) $count=-10 Benchmark: running for, map for at least 10 CPU seconds... for: 10 wallclock secs (10.11 usr + 0.00 sys = 10.11 CPU) @ 3 +.46/s (n=35) map: 10 wallclock secs (10.03 usr + 0.00 sys = 10.03 CPU) @ 3 +.29/s (n=33)

Am I missing something? Is the example given by Damian a poor example? Should I really favor map over for when I want to generate a new list from another list?

Thanks upfront

Update

Beside the obvious advantages: less code, easier to understand, it is stated that map is normally considerably faster.

Replies are listed 'Best First'.
Re: map versus for
by dragonchild (Archbishop) on Aug 04, 2008 at 15:06 UTC
    map can be faster because it's theoretically parallelizable, unlike for which is (generally) not.

    The bigger point, though, is that by using map, you're telling me more about your intent with the code. map says "I'm doing something to each element, something that's probably easily described, and accumulating the result." On the other hand, for says "I'm doing something with each element and it could be anything."


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      That's pretty much the distinction I'd make as well, although I usually phrase it another way: for is for generic iteration, map is specifically a transformation.

      A surefire way to annoy me and lose points when we get sample code is people who for whatever reason use map in void context rather than a proper for loop (it doesn't say "LOOK I R IDIOMATIC CODERZ", it says "ITERATION: UR DOIN IT WRONG" (and yes, I do have a lolcat based scale for applicants :)).

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        I have done some research on best practices regarding map. I wanted to ask at one point why we don't just use map instead of foreach. However I found a few nodes about that subject where, as you just did, map in a null context was brought up. However, I am not sure I understand what that means. Could you describe what map in a null context is?

        -Actualize
Re: map versus for
by dreadpiratepeter (Priest) on Aug 04, 2008 at 14:53 UTC
    I don't have my best practices in front of me, but....
    I don't believe that the tip is for performance, but more for readability and clarity. As a general rule, map is better when doing a transformation, for is better when doing an iteration.
    Or more simply, use map when generating a result, use for otherwise.


    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
Re: map versus for
by pc88mxer (Vicar) on Aug 04, 2008 at 15:21 UTC
    I often will use for instead of map for generating lists, especially when I am in the process of developing the code. Once I've got things figured out I might go back and re-code the loop as a map.

    Using for has the following advantages:

    • you have more control and options over loop execution (last, next, etc.)
    • you can use your own more descriptive named lexical instead of $_
    • it's more readable (especially for non-perl experts)
    If the list-generation logic is just a simple transformation, I'll just opt for a map implementation. Once, however, the logic becomes more complex, an explicit for loop begins to look more attractive. For instance, which of the following do you find easier to understand?
    my @result = map { f($_) ? g($_) : () } @list; # or: my @result; for (@list) { push(@result, g($_)) if (f($_)); }
      my @result = map { g($_) } grep { f($_) } @list;

      But that's just me . . .

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

      Oh definately, because once you put more than one statement into a map, you have violated a bigger stylistic rule.
      In that case, I will either pull the logic into a subroutine and call it from the map (assuming that I may need to reuse it), or switch to a for loop.
      actually, I will switch to a foreach loop. I find that always using foreach for the
      for $var (@list)
      form and for for the c-style form adds to the grokiness of my code.
      UPDATE: should have looked closer at the body of the for there, I agree with Fletch on that one


      -pete
      "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
Re: map versus for
by philipbailey (Curate) on Aug 04, 2008 at 14:52 UTC
    I don't have Perl Best Practices in front of me, but I suspect the point that Damian Conway is making is that "map" is more idiomatic, in Perl, than a "for" loop, and not necessarily that it is more performant.
Re: map versus for
by ikegami (Patriarch) on Aug 04, 2008 at 19:32 UTC

    Best practices rarely have anything to do with performance. They are usually designed to avoid pitfalls or to increase readability and maintainability, often at the cost of performance.

    I think your assumption that Damian recommended map for performance reason is flawed, or did he say as much?

Re: map versus for
by toolic (Bishop) on Aug 04, 2008 at 15:50 UTC
    I do have PBP in front of me, and the focus is definitely on style, rather than performance. Perhaps there are some applications for which map is significantly faster than for, but, judging by your experiment, the code example in the book seems not to be one of them.

    A related discussion (with more links and examples) is Map: The Basics in the Tutorials section.

Re: map versus for
by JavaFan (Canon) on Aug 05, 2008 at 12:16 UTC
    A couple of things:
    1. If a benchmark shows a difference of less than 50%, you might as well consider them the same, except for the perl and the environment you ran the test on. And then just for the test data. Compiling it with a different compiler, or different compiler settings may show a reverse result.
    2. I'm pretty sure Damian doesn't favour map over for because of performance.
    3. Don't take PBP as a gospel. That would be against the spirit of PBP. Take PBP as a starting point to make up your own mind.
    4. If you prefer for over map, by all means, use for. Doing things *your* way, that's the Perl spirit.
Re: map versus for
by dHarry (Abbot) on Aug 05, 2008 at 07:36 UTC

    Instead of replying to every individual response I’ll write a "one-to-many".

    First of all thanks monks for the many reactions. Some of them are really useful and give me insight in the tradeoffs involved en when to prefer one over the other. I especially like the responses of dragonchild, Fletch and pc88mxer which I think are the most useful to me.

    I can fully appreciate the obvious advantages that map has over for and there is no doubt that using map is usually the better choice. But my eye was drawn to the following two paragraphs:

    "There are a couple of other advantages that aren’t quite as obvious. For example, when you use map, most of your looping and list generation is being done in heavily optimized compiled C-code, not in interpreted Perl. So it’s usually being done considerably faster.

    In addition, the map knows in advance exactly how many elements it will eventually process, so it can pre-allocate sufficient space in the list it’s returning. Or rather it can usually pre-allocate sufficient space. If the map’s block returns more than one value for each element of the original list, then extra allocations will still be necessary. But, even then, not as many as the equivalent series of push statements would require."

    (taken from PBP p11, paragraphs 4 and 5)

    I can understand this in no other way then map is usually faster then for. Apparently the example given by Damian doesn’t show the difference. It would be interesting to construct an example which shows the advantage in performance for map over for. So far I have not succeeded in doing this.

      ...when you use map, most of your looping and list generation is being done in heavily optimized compiled C-code, not in interpreted Perl. So it’s usually being done considerably faster.

      Alas for the static nature of the printed word.

      This assertion used to be true (back when I was first developing PBP in early 2004). I wouldn't have written it if the example I used hadn't confirmed the statement when benchmarked.

      But in all the recent versions of Perl I currently have installed (5.8.3, 5.8.8, 5.10.0), a for outdoes the corresponding map on every test I can think to run. Needless to say, PBP edition 2 will be updated to reflect the current behaviours, but that doesn't fix the thousands of existing copies with that overconfident assertion in them. Sigh. One lives and learns.

      Damian

        Wouldn't this be because of the improvements made in 5.8 re: more efficient aliasing of the loop variable?

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: map versus for
by actualize (Monk) on Aug 04, 2008 at 17:13 UTC

    The speed increase involves minimizing resource intensive tasks such as sorting. Instead of of running code every time you iterate over a loop, you perform a map on the data once storing the output into an array. Then you can use the information from the array. The classic example would be the Schwarzian transform .

    -Actualize
Re: map versus for
by GrandFather (Saint) on Aug 05, 2008 at 11:41 UTC

    Well, it all depends on what you do with the results. Consider the following benchmark code:

    use strict; use warnings; use Benchmark qw(cmpthese); my @source = 1 .. 1000000; my @result = @source; cmpthese (-3, { for => '@result = test_for ()', map => '@result = test_map ()', mapf => '@result = test_mapf ()', forv => 'test_for ()', mapv => 'test_mapf ()', }); sub test_for { my @sqrt_results; for my $result (@source) { push @sqrt_results , sqrt($result); } return @sqrt_results; } sub test_map { return my @sqrt_results = map { sqrt $_ } @source; } sub test_mapf { return map { sqrt $_ } @source; }

    Prints:

    Rate map for mapf forv mapv map 2.24/s -- -3% -31% -49% -78% for 2.31/s 3% -- -29% -48% -77% mapf 3.27/s 46% 41% -- -26% -68% forv 4.43/s 98% 92% 36% -- -56% mapv 10.1/s 353% 339% 210% 129% --

    Update: and for about 100,000 or fewer elements percentages are:

    Rate for map mapf forv mapv for 23849/s -- -3% -29% -46% -76% map 24506/s 3% -- -27% -45% -76% mapf 33666/s 41% 37% -- -24% -66% forv 44571/s 87% 82% 32% -- -56% mapv 100159/s 320% 309% 198% 125% --

    In this case for my @source = 1 .. 100;, but the percentages are very much the same over a wide range of elements.


    Perl reduces RSI - it saves typing
Re: map versus for
by MidLifeXis (Monsignor) on Aug 04, 2008 at 17:06 UTC

    I have not done much with perl profiling, but when doing a performance comparison, don't you need to return the same results? It looks like the test_for returns the number of elements in the post-push array, whereas test_map returns the new array, at least if I am reading push correctly.

    Update:Note that I am not saying that the results will change much. In fact, here are mine for 60 seconds. test_for2 does a return of the array at the end of the test function.

    s/iter for2 map for for2 2.70 -- -1% -1% map 2.68 1% -- -0% for 2.67 1% 0% -- Benchmark: running for, for2, map for at least 60 CPU seconds... for: 61 wallclock secs (61.37 usr + 0.02 sys = 61.39 CPU) @ 0 +.37/s (n=23) for2: 61 wallclock secs (61.37 usr + 0.03 sys = 61.40 CPU) @ 0 +.37/s (n=23) map: 62 wallclock secs (61.48 usr + 0.03 sys = 61.51 CPU) @ 0 +.37/s (n=23)

    Update 2: Would some kind monk be willing to comment on if it is sufficient to just define the raw function (as in the OP), or would you also need to have the function return into a context of some sort. In other words, should there be another layer of function call here to force list context to make this a valid comparison?

    --MidLifeXis

Re: map versus for
by jbert (Priest) on Aug 05, 2008 at 10:48 UTC
    You're right. I'm surprised too (not least because map could actually be implemented that way within perl...). I get a lot of variability (perl 5.8.8, x86_64-linux-gnu-thread-multi), but the for loop is a little faster.

    All the comments about better style still stand, but I would certainly have expected the map to be faster. This is odd. Anyone have a 5.10 to hand to see if anything has changed?

    Sidenote:

    # Populate list with 10 mio numbers for (my $i=0; $i<1000_000; $i++) { push @results, $i; }
    is perhaps better written as: my @results = (0..1_000_000); (Of course, given your observation regarding map, whether it performs better is now a whole 'nother ball of wax.)