One of the things that we all love about Perl is TMTOWTDI. The options allow us to excerise our creativity in many ways, however it has its down side. Having so many options to choose from means we need to be all the more discriminatory as to the choices we make. An example is that recently I've been browsing through the Monastery archives and I've noticed is what to me are lots of inappropriate looping constructs being used. Read ineffecient for inappropriate. They seem to be used incorrectly either because the author comes from a C background and is unaware that looks are deceiving, or because the author misunderstands basic perl ideas.
Now before I go further lets deal with the 'If you want fast then you shouldn't be using Perl, use C' objections. Crap. That kind of logic would lead someone who has just discovered that their Perl bubblesort implementation is too slow to rwrite it in C, instead of doing the obvious and using perls built in sort. Why would I want to use code that is twice as slow as code that is just is easy to write in the same language?? Sure if overall kickass speed was the bottom line I wouldnt use Perl. But being aware of and taking advantage of the optimizations that Perl allows isn't wrong, its smart.
The inappropriate loop structure I see the most is the 3 argument for. Any time I see this being used to iterate over numbers from $lower to $higher I assume the author is inexperienced in the Perl world, when its an experienced member of the community I basically wonder what they were thinking.
The next is the inappropriate use of grep and map. grep seems to be the ideal subject for abuse. Lots of people seem to think of it as an 'element finder'. Uh, No. Its a filter. The idea is for it to be used on lots of elements to find subsets of the original elements, normally more than one. There are _much_ more efficient ways of finding a single element in an array. map() is abused in a similar way. It is NOT a for loop. Ok, if you're writing an obfu, or you _really_ dont care about speed then fine use map or grep. But its a piece of shorthand that should be used only rarely, that is when you need to transform most elements from one array into a different set of elements in a new array. Incidentally if you decide to use map or grep in void context then grep seems to be faster.
So heres what to keep in mind:
- Stay away from the 3 arg for loop. This actually is dealt with by perl as while(){}continue{} and accordingly has the overhead of creating two scoped blocks added in. Its very slow.
- If you must use the 3 arg for loop (loop from $highest to $lowest) *only* use the first two args. Perl is mart enough to convert these into while(){} and avoid the extra continue block.
- Map is a tranform function. Dont use it instead of a for loop unless you actually need to apply a function to each and every element in a loop. If you want to use it in void context then grep will probably be faster.
- Grep is a filter. Dont use it if you are looking for only one element. But for some reason its faster than map.
- If you dont need a scoped block then try to write looping code without it using modifiers. Creative use of the logical operators and the comma operator come in use here.
- for (LIST) is basically speaking the most efficient way to iterate. This includes the use of the '..' construct
So to put a little money where my mouth is here are some benchmarks.
use Benchmark 'cmpthese';
our $n=10;
our $sum=0;
cmpthese (-5,{
'grep{}' => '$sum=0; grep {$sum+=$_} 0..$n;',
'grep()' => '$sum=0; grep ($sum+=$_,0..$n);',
'map{}' => '$sum=0; map {$sum+=$_} 0..$n;',
'map()' => '$sum=0; map ($sum+=$_,0..$n);',
'for_modif' => '$sum=0; $sum+=$_ foreach 0..$n;',
'foreach' => '$sum=0; foreach my $i (0..$n) {$sum+=$i}
+',
'for(;;)3' => '$sum=0; for (my $i=0;$i<=$n;$i++) {$sum+
+=$i}',
'for(;;)2' => '$sum=0; for (my $i=0;$i++<=$n;) {$sum+=$
+i}',
'w{}c{}' => '$sum=0; my $i=0; while($i<=$n){$sum+=$i}
+continue{$i++}',
'while' => '$sum=0; my $i=0; while($i<=$n){$sum+=$i+
++};',
});
__END__
#last two columns removed for space reasons
Rate map{} grep{} map() for(;;)3 w{}c{} grep() for(;;)2 w
+hile
map{} 39416/s -- -3% -21% -34% -34% -41% -45%
+-47%
grep{} 40750/s 3% -- -18% -32% -32% -39% -44%
+-46%
map() 49897/s 27% 22% -- -16% -17% -25% -31%
+-33%
for(;;)3 59526/s 51% 46% 19% -- -0% -11% -18%
+-21%
w{}c{} 59804/s 52% 47% 20% 0% -- -10% -17%
+-20%
grep() 66614/s 69% 63% 34% 12% 11% -- -8%
+-11%
for(;;)2 72308/s 83% 77% 45% 21% 21% 9% --
+ -4%
while 75007/s 90% 84% 50% 26% 25% 13% 4%
+ --
foreach 80124/s 103% 97% 61% 35% 34% 20% 11%
+ 7%
for_modif 84073/s 113% 106% 68% 41% 41% 26% 16%
+ 12%
Now this all applies to 5.6, I'd be interested in knowing how it differs from earlier version.
Anyway, I suppose this is a bit of a rant, but its something that has gotten to me a bit lately, especially respected members of our community posting what I consider to be 'clever but BS' solutions to problems. To me if a solution is not as effecient as is reasonable then it shouldnt be posted.
:-)
Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)
Re (tilly) 1: Efficient Looping Constructs
by tilly (Archbishop) on Sep 29, 2001 at 07:41 UTC
|
While I agree with what you want people to do, I think you
have hit on the single worst possible reason to do it.
Namely code
tuning.
There are far better reasons that you can and should point
out. For instance one of the most common errors in C is
a fencepost error. If you use C-style loops, it will be as
common in Perl as it is in C. (In fact you have a fencepost
error in your benchmark code.) If you loop using Perlish
foreach loops, this error practically vanishes.
So by using foreach style looping you can kill one of your
most common bugs. Not bad.
As for the abuses of map and grep, yes, they waste time
and memory. However they also make your code less clear.
They do more than a foreach loop does, and therefore you
have more to think about when you run across one.
That for me is a bigger deal than code tuning.
Oh, and a note. People say that they use map in golf
because it is shorter. Wrong.
map$_++,@_;
$_++for@_;
Even for golfing purposes, gratuitous abuse of map and grep
is usually misguided.
Now if you can write code using a hash lookup instead of a
grep, well that is not just code tuning, that is an
algorithmic improvement. When you start talking algorithmic
improvements, you get huge performance increases. But even
so a hash lookup is clearer. So even
if the program will run fast enough either way, I would
use the hash lookup for clarity.
And everything that I just said about writing using Perl's
syntactic sugar is sufficient for me to use it, regardless
of whether it was faster. And it is important to think
that way. Because if you talk to people about how fast
constructs are, and teach them to think at that level,
before you know it they will miss the forest for the
trees.
Why would any performance oriented person use a
hash instead of an array for a structure? Array access
uses less memory and runs faster! But a hash is self
documenting. It is faster to debug. You make fewer
mistakes. What it costs in computer time and energy
is more than made up for in human time.
I want people to use hashes. I want that because I don't
want to waste my time wading through buggy and unreadable
positionally based logic. And it doesn't happen until
programmers understand that there is such a thing as
"fast enough" and from then on their time is worth more
than the computer's time. (Besides which, worrying about
maintainability gives you more leisure to profile and find
bigger speed increases later. Worrying about speed at
every step is likely to result in a slower program in the
end.) | [reply] [d/l] |
|
Hear hear! My personal cycles are much more valuable than those of a computer.
The three main bottlenecks to program execution speed are (in order of increasing cost):
- RAM
- CPU
- Program efficiency
It may not be politically-correct and it may not be the thing to say, but inefficient programs that are "good enough" are, well, "good enough". You don't have to achieve perfection. If your program executes faster than the human mind can register, but uses a bubble-sort ... who cares?!? It's not scalable, but if it doesn't have to be ... it's a moot point.
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
| [reply] |
Re: Efficient Looping Constructs
by dws (Chancellor) on Sep 29, 2001 at 06:57 UTC
|
Comparative benchmarks like this help settle numerous disagreements, and I appreciate and applaud the work that goes into them. Still, it does help if the code fragments being benchmarked do the same thing. The fragments tested do not. Consider
$sum=0; for (my $i=0;$i++<=$n;) {$sum+=$i}
Side-effecting $i prematurely breaks the calculation (an off-by-$n error).
| [reply] [d/l] [select] |
|
'for(;;)2' => '$sum=0; for (my $i=0;$i<=$n;) {$sum+=$i++}'
Which in itself is an instructive point, converting a 3 argument into a while(){} or simply by converting it into the 2 of 3 argment form, achieves a considerable speedup.
However while your point is fair I beleive that you will agree that in this case, luckily, this error did not affect the validity of the timings as the loop still performed the same set of iterations, and the sum was simply present to ensure (perhaps without reason) that perl did not optimize away empty blocks (perhaps inconsistently across the variation). The only difference being that as you said my error meant that that loop calculated 1..$n+1 instead of 0..$n.
Nevertheless should I make a similer post I shall endeavour to ensure that I dont make this type of mistake again.
:-)
Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM) | [reply] [d/l] |
Re: Efficient Looping Constructs
by danger (Priest) on Sep 30, 2001 at 00:49 UTC
|
The inappropriate loop structure I see the most is the 3 argument
for. Any time I see this being used to iterate over numbers from
$lower to $higher I assume the author is inexperienced in the Perl
world, when its an experienced member of the community I basically
wonder what they were thinking.
So, if I use for(my $i = 0; $i < 100; $i++){...} your
assumption is that I am either inexperienced with Perl or perhaps not
thinking clearly?
Just what is the penalty for using the 'for(;;)' version? I've seen
your benchmark, let's look at another that compares differences
between the two constructs using both lexicals and globals for the
summation variable (caps for globals):
| [reply] [d/l] [select] |
|
The most noticeable penalty of using for(;;) as a simple counting loop is that your code is less readable. Instead of using a C-ism that is unnecessary, why not just use while(1)? Or, while you're at it, why not just count in a more readable fashion? Which is more readable?
my $sum = 0;
for (my $i = 0; $i <= $num; $i++) {
$sum += $i;
}
#####
my $sum = 0;
foreach my $i (0 .. $num) {
$sum += $i;
}
To me, LOW and HIGH are very clearly stated in the foreach. In the for-loop, there are more characters involved. In addition, those characters are semi-colons. This is the only place in C (or Perl) where the semi-colon does not indicate the end of a thought. It, instead, indicates the end of a sub-thought. You have a rule, then you have an exception to the rule that's used everywhere! I understand why they did it, and I understand the syntactic reasons for it, too. But, I do not think of for-loops as three separate statements within a block construct. All other block constructs are one statement-one thought. Why should for be any different?!?
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. | [reply] [d/l] [select] |
|
The most noticeable penalty of using for(;;) as a simple counting
loop is that your code is less readable. Instead of using a C-ism
that is unnecessary, why not just use while(1)?
/me wonders, with so very many C-isms that are part and parcel of the
Perl language, why is the 'for(;;)' loop continually singled out as
some kind of negative reflection on the programmer?
OK, I guess I just don't buy the readability argument.
Less readable for who? I would agree that the foreach version
*looks* cleaner --- but the fact is that when I am 'reading' code (as
opposed to just looking at it) my brain is in code-mode, and the
C-style version is simply *not* any harder for me to read (or write for
that matter). And it probably wouldn't be any harder for anyone else
who knows the C-style version to read either (and there isn't any
point comparing readability between those who know it and those who
don't). Your suggestion to use a 'while' loop rather than an
equivalent C-style for loop might make it *less* readable (the for
loop has the key information all together in one place).
In complete honesty, I think the difference in readability between
the for(;;) and foreach forms of a counting loop is *less* than
the readability difference between cuddled and uncuddled else
clauses (for those who prefer one style over the other). Even if we
personally feel very stongly about cuddling or not cuddling, we
simply accept that some people will do it differently and we don't
make a big deal about it (at least not in a prescriptive manner).
I am not trying to be a cultural relativist here --- all ways aren't
equally good. Using grep() to test if an element exists in a list is
using the wrong tool for the job. But I simply can't see the same
objection being applied to 'for(;;)' as a simple counting loop. A
more relevant concern may be not whether one uses either version to
iterate over a range of numbers, but whether they are iterating over
numbers for the wrong reason, ie:
#1:
for (my $i = 0; $i < @array; $i++){
$array[$i] += 10;
}
#2:
foreach my $i ( 0 .. $#array) {
$array[$i] += 10;
}
The problem with both of these has nothing to do with style of the
counting loop, but the fact that they use a counting loop at all.
My point was never to argue that 'for(;;)' is better or worse than
'foreach' for counting loops. I was arguing against judging someone
else's experience and understanding of Perl based on their choice of
one construct. Especially when that argument is based on benchmarks
that can be both misleading and are essentially vacuous with
regard to the real world performance of an application.
That being said, sometimes performance is a real factor.
As a point of history, and as I've mentioned before,
using foreach my $i (1 .. 1000000) may look clean and
neat, but it wasn't always a very smart thing to do. Prior to
version 5.005 (mid 1998), that loop built the million element list in
memory and thus could have very *real* performance consequences (show
stopping consequences in fact). Therefore, it is quite easy to
imagine that someone could have learned this lesson (say, in 1996)
and acquired a preference for the 'for(;;)' loop to iterate over
ranges of numbers as part of their *Perl* vocabulary (not just as a
leftover "C-ism") due to having *more* (or at least longer)
experience with Perl (rather than less, as the original poster
assumes).
| [reply] [d/l] [select] |
|
|
|
Re: Efficient Looping Constructs
by rchiav (Deacon) on Sep 29, 2001 at 22:42 UTC
|
I think it should also be noted that loops in general are ineffecient. Not that you should avoid them, but under certian circumstances you can optimize your code by not using one. I was a little reluctant to post this because I don't want to misdirect people to think that they should be trying to optimize away their loops. So I'll say that under very limited circumstances, you might want to consider this. Those limited circumstances would be when you notice that you have a substancial bottleneck where you're looping over a static number of elements (known at compile time).
Here's an example..
#!/usr/bin/perl -w
use strict;
use Benchmark;
timethese(100000, {
'loop' =>'loop()',
'noloop' => 'noloop()'
});
sub loop {
my $x;
for (1..10) {
$x++;
}
}
sub noloop {
my $x;
$x++;
$x++;
$x++;
$x++;
$x++;
$x++;
$x++;
$x++;
$x++;
$x++;
}
Results:
Benchmark: timing 100000 iterations of loop, noloop...
loop: 4 wallclock secs ( 3.48 usr + 0.03 sys = 3.51 CPU) @ 28
+490.03/s (
n=100000)
noloop: 0 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 91
+743.12/s (
n=100000)
Now in this example, the for loop works just fine, but as you can see, not using a loop is three times as fast. %99.9 of the time, you're not going to have to optimize this unless you're calling this 100k times.
Anyway, I just thought I'd throw it out there..
Rich | [reply] [d/l] [select] |
|
|