http://qs321.pair.com?node_id=40941


in reply to Efficient Perl Programming

What I would advise if you want to get your program running faster, is to use Devel::OpProf. This ties right into the perlfaq 3 question, "How do I make my Perl program run faster?" The advice given is to improve your algorithm. Okay, but how to do that? One of the best ways to compare different ways of doing things is to look at the resources they take up. Benchmark.pm will tell you how fast a code fragment runs, but that information isn't very portable. On a machine with lots of RAM, you might be developing programs that are very fast, but will generally lag on other people's machines. Devel::OpProf lets you identify these bottlenecks, even when they aren't slowing you down.

As an example, lets consider that we want to make a list of 1000 elements, with each element set to 1.

#!/usr/bin/perl -w use strict; use Devel::OpProf qw( profile print_stats zero_stats ); use Benchmark qw( timethese ); profile(1); print "*** one() ***\n"; my @one = one(); print_stats(); zero_stats(); print "\n*** two() ***\n"; my @two = two(); print_stats(); zero_stats(); print "\n*** three() ***\n"; my @three = three(); print_stats(); zero_stats(); print "\n*** four() ***\n"; my @four = four(); print_stats(); zero_stats(); timethese( 0, { test_one => 'one()', test_two => 'two()', test_three => 'three()', test_four => 'four()' } ); sub one { my @list; for ( my $i = 0; $i < 1000; $i++ ) { $list[$i] = 1; } return @list; } sub two { my @list; for ( 0..999 ) { $list[$_] = 1; } return @list; } sub three { my @list = map { 1 } (1..1000); return @list; } sub four { my @list; @list[0..999] = (1) x 1000; return @list; }
On my machine, this outputs:
*** one() ***
private variable         3002
constant item            2003
next statement           1009
private array            1004
numeric lt (<)           1001
logical and (&&)         1001
scalar assignment        1001
preincrement (++)        1000
iteration finalizer      1000
array element            1000
pushmark                 8
glob value               4
subroutine entry         3
conditional expression   1
list assignment          1
block entry              1
array dereference        1
loop entry               1
loop exit                1
return                   1
print                    1

*** two() ***
next statement           2009
private array            1004
constant item            1003
foreach loop iterator    1001
logical and (&&)         1001
array element            1000
scalar variable          1000
iteration finalizer      1000
scalar assignment        1000
pushmark                 9
glob value               5
subroutine entry         3
block entry              1
list assignment          1
conditional expression   1
array dereference        1
foreach loop entry       1
loop exit                1
return                   1
print                    1

*** three() ***
null operation           1002
constant item            1002
map iterator             1000
block                    1000
pushmark                 11
next statement           8
private array            4
glob value               4
subroutine entry         3
array dereference        2
list assignment          2
map                      1
block entry              1
conditional expression   1
return                   1
print                    1

*** four() ***
pushmark                 12
next statement           9
private array            5
glob value               4
constant item            4
subroutine entry         3
array dereference        2
list assignment          2
repeat (x)               1
conditional expression   1
null operation           1
array slice              1
block entry              1
return                   1
print                    1
Benchmark: running test_four, test_one, test_three, test_two, each for at least 3 CPU seconds...
  test_one:  3 wallclock secs ( 3.16 usr +  0.00 sys =  3.16 CPU) @ 274.68/s (n=868)
  test_two:  3 wallclock secs ( 3.25 usr +  0.00 sys =  3.25 CPU) @ 359.08/s (n=1167)
test_three:  3 wallclock secs ( 3.01 usr +  0.00 sys =  3.01 CPU) @ 408.64/s (n=1230)
 test_four:  3 wallclock secs ( 3.27 usr +  0.01 sys =  3.28 CPU) @ 501.83/s (n=1646)

So, while Benchmark could tell us which is faster, it doesn't really tell us why. Devel::OpProf shows us that it is because of the number of temporary variables, &&'s, assignments, etc.

(while not really relevant to the point of this post, I should point out that probably a better idea than the slice in #4 is to just my @list = (1) x 1000;, because it is more readable. But, you are welcome to profile them to see if there is an algorithmical difference... ;)

Paris Sinclair    |    4a75737420416e6f74686572
pariss@efn.org    |    205065726c204861636b6572
http://sinclairinternetwork.com