Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Efficient Looping Constructs

by demerphq (Chancellor)
on Sep 29, 2001 at 05:41 UTC ( [id://115559]=perlmeditation: print w/replies, xml ) Need Help??

One of the things that we all love about Perl is TMTOWTDI. The options allow us to excerise our creativity in many ways, however it has its down side. Having so many options to choose from means we need to be all the more discriminatory as to the choices we make. An example is that recently I've been browsing through the Monastery archives and I've noticed is what to me are lots of inappropriate looping constructs being used. Read ineffecient for inappropriate. They seem to be used incorrectly either because the author comes from a C background and is unaware that looks are deceiving, or because the author misunderstands basic perl ideas.

Now before I go further lets deal with the 'If you want fast then you shouldn't be using Perl, use C' objections. Crap. That kind of logic would lead someone who has just discovered that their Perl bubblesort implementation is too slow to rwrite it in C, instead of doing the obvious and using perls built in sort. Why would I want to use code that is twice as slow as code that is just is easy to write in the same language?? Sure if overall kickass speed was the bottom line I wouldnt use Perl. But being aware of and taking advantage of the optimizations that Perl allows isn't wrong, its smart.

The inappropriate loop structure I see the most is the 3 argument for. Any time I see this being used to iterate over numbers from $lower to $higher I assume the author is inexperienced in the Perl world, when its an experienced member of the community I basically wonder what they were thinking.

The next is the inappropriate use of grep and map. grep seems to be the ideal subject for abuse. Lots of people seem to think of it as an 'element finder'. Uh, No. Its a filter. The idea is for it to be used on lots of elements to find subsets of the original elements, normally more than one. There are _much_ more efficient ways of finding a single element in an array. map() is abused in a similar way. It is NOT a for loop. Ok, if you're writing an obfu, or you _really_ dont care about speed then fine use map or grep. But its a piece of shorthand that should be used only rarely, that is when you need to transform most elements from one array into a different set of elements in a new array. Incidentally if you decide to use map or grep in void context then grep seems to be faster.

So heres what to keep in mind:

  • Stay away from the 3 arg for loop. This actually is dealt with by perl as while(){}continue{} and accordingly has the overhead of creating two scoped blocks added in. Its very slow.
  • If you must use the 3 arg for loop (loop from $highest to $lowest) *only* use the first two args. Perl is mart enough to convert these into while(){} and avoid the extra continue block.
  • Map is a tranform function. Dont use it instead of a for loop unless you actually need to apply a function to each and every element in a loop. If you want to use it in void context then grep will probably be faster.
  • Grep is a filter. Dont use it if you are looking for only one element. But for some reason its faster than map.
  • If you dont need a scoped block then try to write looping code without it using modifiers. Creative use of the logical operators and the comma operator come in use here.
  • for (LIST) is basically speaking the most efficient way to iterate. This includes the use of the '..' construct
So to put a little money where my mouth is here are some benchmarks.
use Benchmark 'cmpthese'; our $n=10; our $sum=0; cmpthese (-5,{ 'grep{}' => '$sum=0; grep {$sum+=$_} 0..$n;', 'grep()' => '$sum=0; grep ($sum+=$_,0..$n);', 'map{}' => '$sum=0; map {$sum+=$_} 0..$n;', 'map()' => '$sum=0; map ($sum+=$_,0..$n);', 'for_modif' => '$sum=0; $sum+=$_ foreach 0..$n;', 'foreach' => '$sum=0; foreach my $i (0..$n) {$sum+=$i} +', 'for(;;)3' => '$sum=0; for (my $i=0;$i<=$n;$i++) {$sum+ +=$i}', 'for(;;)2' => '$sum=0; for (my $i=0;$i++<=$n;) {$sum+=$ +i}', 'w{}c{}' => '$sum=0; my $i=0; while($i<=$n){$sum+=$i} +continue{$i++}', 'while' => '$sum=0; my $i=0; while($i<=$n){$sum+=$i+ ++};', }); __END__ #last two columns removed for space reasons Rate map{} grep{} map() for(;;)3 w{}c{} grep() for(;;)2 w +hile map{} 39416/s -- -3% -21% -34% -34% -41% -45% +-47% grep{} 40750/s 3% -- -18% -32% -32% -39% -44% +-46% map() 49897/s 27% 22% -- -16% -17% -25% -31% +-33% for(;;)3 59526/s 51% 46% 19% -- -0% -11% -18% +-21% w{}c{} 59804/s 52% 47% 20% 0% -- -10% -17% +-20% grep() 66614/s 69% 63% 34% 12% 11% -- -8% +-11% for(;;)2 72308/s 83% 77% 45% 21% 21% 9% -- + -4% while 75007/s 90% 84% 50% 26% 25% 13% 4% + -- foreach 80124/s 103% 97% 61% 35% 34% 20% 11% + 7% for_modif 84073/s 113% 106% 68% 41% 41% 26% 16% + 12%
Now this all applies to 5.6, I'd be interested in knowing how it differs from earlier version.

Anyway, I suppose this is a bit of a rant, but its something that has gotten to me a bit lately, especially respected members of our community posting what I consider to be 'clever but BS' solutions to problems. To me if a solution is not as effecient as is reasonable then it shouldnt be posted.

:-)

Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Replies are listed 'Best First'.
Re (tilly) 1: Efficient Looping Constructs
by tilly (Archbishop) on Sep 29, 2001 at 07:41 UTC
    While I agree with what you want people to do, I think you have hit on the single worst possible reason to do it.

    Namely code tuning.

    There are far better reasons that you can and should point out. For instance one of the most common errors in C is a fencepost error. If you use C-style loops, it will be as common in Perl as it is in C. (In fact you have a fencepost error in your benchmark code.) If you loop using Perlish foreach loops, this error practically vanishes.

    So by using foreach style looping you can kill one of your most common bugs. Not bad.

    As for the abuses of map and grep, yes, they waste time and memory. However they also make your code less clear. They do more than a foreach loop does, and therefore you have more to think about when you run across one. That for me is a bigger deal than code tuning.

    Oh, and a note. People say that they use map in golf because it is shorter. Wrong.

    map$_++,@_; $_++for@_;
    Even for golfing purposes, gratuitous abuse of map and grep is usually misguided.

    Now if you can write code using a hash lookup instead of a grep, well that is not just code tuning, that is an algorithmic improvement. When you start talking algorithmic improvements, you get huge performance increases. But even so a hash lookup is clearer. So even if the program will run fast enough either way, I would use the hash lookup for clarity.

    And everything that I just said about writing using Perl's syntactic sugar is sufficient for me to use it, regardless of whether it was faster. And it is important to think that way. Because if you talk to people about how fast constructs are, and teach them to think at that level, before you know it they will miss the forest for the trees.

    Why would any performance oriented person use a hash instead of an array for a structure? Array access uses less memory and runs faster! But a hash is self documenting. It is faster to debug. You make fewer mistakes. What it costs in computer time and energy is more than made up for in human time.

    I want people to use hashes. I want that because I don't want to waste my time wading through buggy and unreadable positionally based logic. And it doesn't happen until programmers understand that there is such a thing as "fast enough" and from then on their time is worth more than the computer's time. (Besides which, worrying about maintainability gives you more leisure to profile and find bigger speed increases later. Worrying about speed at every step is likely to result in a slower program in the end.)

      Hear hear! My personal cycles are much more valuable than those of a computer.

      The three main bottlenecks to program execution speed are (in order of increasing cost):

      1. RAM
      2. CPU
      3. Program efficiency
      It may not be politically-correct and it may not be the thing to say, but inefficient programs that are "good enough" are, well, "good enough". You don't have to achieve perfection. If your program executes faster than the human mind can register, but uses a bubble-sort ... who cares?!? It's not scalable, but if it doesn't have to be ... it's a moot point.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Efficient Looping Constructs
by dws (Chancellor) on Sep 29, 2001 at 06:57 UTC
    Comparative benchmarks like this help settle numerous disagreements, and I appreciate and applaud the work that goes into them. Still, it does help if the code fragments being benchmarked do the same thing. The fragments tested do not. Consider   $sum=0; for (my $i=0;$i++<=$n;) {$sum+=$i} Side-effecting $i prematurely breaks the calculation (an off-by-$n error).

      My apologies. You are of course correct. In my defense I can only say that the two argument usage of the three argument for was a hasty addition, in order to be fair, after I noticed that its performance was better in other benchmarks that I did.

      I corrected the fragment to the below

      'for(;;)2' => '$sum=0; for (my $i=0;$i<=$n;) {$sum+=$i++}'
      Which in itself is an instructive point, converting a 3 argument into a while(){} or simply by converting it into the 2 of 3 argment form, achieves a considerable speedup.

      However while your point is fair I beleive that you will agree that in this case, luckily, this error did not affect the validity of the timings as the loop still performed the same set of iterations, and the sum was simply present to ensure (perhaps without reason) that perl did not optimize away empty blocks (perhaps inconsistently across the variation). The only difference being that as you said my error meant that that loop calculated 1..$n+1 instead of 0..$n.

      Nevertheless should I make a similer post I shall endeavour to ensure that I dont make this type of mistake again.

      :-)

      Yves
      --
      You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Re: Efficient Looping Constructs
by danger (Priest) on Sep 30, 2001 at 00:49 UTC
    The inappropriate loop structure I see the most is the 3 argument for. Any time I see this being used to iterate over numbers from $lower to $higher I assume the author is inexperienced in the Perl world, when its an experienced member of the community I basically wonder what they were thinking.

    So, if I use for(my $i = 0; $i < 100; $i++){...} your assumption is that I am either inexperienced with Perl or perhaps not thinking clearly?

    Just what is the penalty for using the 'for(;;)' version? I've seen your benchmark, let's look at another that compares differences between the two constructs using both lexicals and globals for the summation variable (caps for globals):

      The most noticeable penalty of using for(;;) as a simple counting loop is that your code is less readable. Instead of using a C-ism that is unnecessary, why not just use while(1)? Or, while you're at it, why not just count in a more readable fashion? Which is more readable?
      my $sum = 0; for (my $i = 0; $i <= $num; $i++) { $sum += $i; } ##### my $sum = 0; foreach my $i (0 .. $num) { $sum += $i; }
      To me, LOW and HIGH are very clearly stated in the foreach. In the for-loop, there are more characters involved. In addition, those characters are semi-colons. This is the only place in C (or Perl) where the semi-colon does not indicate the end of a thought. It, instead, indicates the end of a sub-thought. You have a rule, then you have an exception to the rule that's used everywhere! I understand why they did it, and I understand the syntactic reasons for it, too. But, I do not think of for-loops as three separate statements within a block construct. All other block constructs are one statement-one thought. Why should for be any different?!?

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        The most noticeable penalty of using for(;;) as a simple counting loop is that your code is less readable. Instead of using a C-ism that is unnecessary, why not just use while(1)?

        /me wonders, with so very many C-isms that are part and parcel of the Perl language, why is the 'for(;;)' loop continually singled out as some kind of negative reflection on the programmer?

        OK, I guess I just don't buy the readability argument. Less readable for who? I would agree that the foreach version *looks* cleaner --- but the fact is that when I am 'reading' code (as opposed to just looking at it) my brain is in code-mode, and the C-style version is simply *not* any harder for me to read (or write for that matter). And it probably wouldn't be any harder for anyone else who knows the C-style version to read either (and there isn't any point comparing readability between those who know it and those who don't). Your suggestion to use a 'while' loop rather than an equivalent C-style for loop might make it *less* readable (the for loop has the key information all together in one place).

        In complete honesty, I think the difference in readability between the for(;;) and foreach forms of a counting loop is *less* than the readability difference between cuddled and uncuddled else clauses (for those who prefer one style over the other). Even if we personally feel very stongly about cuddling or not cuddling, we simply accept that some people will do it differently and we don't make a big deal about it (at least not in a prescriptive manner).

        I am not trying to be a cultural relativist here --- all ways aren't equally good. Using grep() to test if an element exists in a list is using the wrong tool for the job. But I simply can't see the same objection being applied to 'for(;;)' as a simple counting loop. A more relevant concern may be not whether one uses either version to iterate over a range of numbers, but whether they are iterating over numbers for the wrong reason, ie:

        #1: for (my $i = 0; $i < @array; $i++){ $array[$i] += 10; } #2: foreach my $i ( 0 .. $#array) { $array[$i] += 10; }

        The problem with both of these has nothing to do with style of the counting loop, but the fact that they use a counting loop at all.

        My point was never to argue that 'for(;;)' is better or worse than 'foreach' for counting loops. I was arguing against judging someone else's experience and understanding of Perl based on their choice of one construct. Especially when that argument is based on benchmarks that can be both misleading and are essentially vacuous with regard to the real world performance of an application.

        That being said, sometimes performance is a real factor. As a point of history, and as I've mentioned before, using foreach my $i (1 .. 1000000) may look clean and neat, but it wasn't always a very smart thing to do. Prior to version 5.005 (mid 1998), that loop built the million element list in memory and thus could have very *real* performance consequences (show stopping consequences in fact). Therefore, it is quite easy to imagine that someone could have learned this lesson (say, in 1996) and acquired a preference for the 'for(;;)' loop to iterate over ranges of numbers as part of their *Perl* vocabulary (not just as a leftover "C-ism") due to having *more* (or at least longer) experience with Perl (rather than less, as the original poster assumes).

Re: Efficient Looping Constructs
by rchiav (Deacon) on Sep 29, 2001 at 22:42 UTC
    I think it should also be noted that loops in general are ineffecient. Not that you should avoid them, but under certian circumstances you can optimize your code by not using one. I was a little reluctant to post this because I don't want to misdirect people to think that they should be trying to optimize away their loops. So I'll say that under very limited circumstances, you might want to consider this. Those limited circumstances would be when you notice that you have a substancial bottleneck where you're looping over a static number of elements (known at compile time).

    Here's an example..

    #!/usr/bin/perl -w use strict; use Benchmark; timethese(100000, { 'loop' =>'loop()', 'noloop' => 'noloop()' }); sub loop { my $x; for (1..10) { $x++; } } sub noloop { my $x; $x++; $x++; $x++; $x++; $x++; $x++; $x++; $x++; $x++; $x++; }
    Results:
    Benchmark: timing 100000 iterations of loop, noloop... loop: 4 wallclock secs ( 3.48 usr + 0.03 sys = 3.51 CPU) @ 28 +490.03/s ( n=100000) noloop: 0 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 91 +743.12/s ( n=100000)
    Now in this example, the for loop works just fine, but as you can see, not using a loop is three times as fast. %99.9 of the time, you're not going to have to optimize this unless you're calling this 100k times.

    Anyway, I just thought I'd throw it out there..

    Rich

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://115559]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (8)
As of 2024-04-24 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found