JaWi has asked for the wisdom of the Perl Monks concerning the following question:
My fellow monks, I'm requesting for your greater knowledge once again!
I code in Perl for about 3-4 years now, and never I took care about how to write fast Perl code. I recently started to `rethink' about my written code/snippets and was wondering about the performance of various approaches of the same functionality.
Most of the documents about Perl stress on the various ways of writing code, but not on the performance of those approaches (I thus can assume these various ways don't affect the program's performance??)
Now for the real question: how do you, my fellow monk, optimize your Perl code? Are you following some specific approaches, explicitly avoid specific structures in your code? Or is it all magic?
My sincere gratitudes,
-- JaWi
"A chicken is an egg's way of producing more eggs."
(jeffa) Re: Optimizing existing Perl code (in practise)
by jeffa (Bishop) on Aug 18, 2002 at 22:21 UTC
|
If it has been said once it has been said a thousand times
"beware of premature optimization!" Ask
yourself, "does this really need to be faster? Really?"
I think a very important item to optimize is code
maintainabibilty - how easy is it to extend your program
and fix bugs that break your code?
So, how do i optimize my Perl code? I generally don't (but
i do try to get it right the first time - measure twice,
cut once).
If i do, it is to replace areas of wheel re-invention with
CPAN modules, or to refactor items into classes to
improve robustness. If i wanted faster code i would port it to C instead, but since most of what i write relies on database and web servers, Perl is 90% of the time not the
bottleneck.
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] |
|
The fastest script in the world is worthless if a change in your
system's directory structure breaks your code and you can't fix it.
If you bother to optimize for anything, do it for maintainability
But never forget "monitorability".
Unless your script is being called to do huge jobs, or your resources
are very restricted (Sparc Ultra 1 or Intel 486, etc.) optimization for
speed is not usually that big an issue.
However, thorough and correct logging of events, meaningful commentary
in the script itself, reusability of the code; these will all help
with maintainability.
| [reply] |
|
| [reply] |
|
Re: Optimizing existing Perl code (in practise)
by atcroft (Abbot) on Aug 18, 2002 at 22:07 UTC
|
| [reply] |
Re: Optimizing existing Perl code (in practise)
by sauoq (Abbot) on Aug 18, 2002 at 22:50 UTC
|
The real trick isn't optimizing your code but optimizing your solution. Maybe you can write the same code three different ways but if that code implements an O(N2) algorithm when there is an O(N) algorithm that will do it doesn't matter much whether you shave a few microseconds off each iteration.
Successfully choosing the right algorithm takes careful consideration of the problem. If there is a secret to it at all it's probably choosing the right representation for your data. How to do that is a matter of experience and education. There isn't a cookbook solution to it because it usually depends greatly upon details of the problem you need to solve.
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
|
The real trick isn't optimizing your code but optimizing your solution.
this is a good paraphrase of my answer... hell yeah, i optimize my code... but it's the data bottlenecks i optimize, not the millisecond differences you'd only get by porting to C. when all i get is milliseconds, i purely write for readability/security/correctness, not speed.
i try to understand what my database does to retrieve/store data, so the requests i make of it can use indices, not full searches. i try to dump the results of a query into a perl hash if i reuse it, to avoid requerying. i try to keep the data ordered in such a way as to avoid the actual data structure causing problems, etc.
i find that for what i want to do one or both of {perl, mysql} always has the data structure i want (in terms of efficiency of search, add, delete), so i can almost always avoid doing the hard work.. that makes for efficiency of development on several levels. by knowing perl/mysql i have an easy answer to all my data structure problems, i just have to use the struct, not build one first. and it is readable by others simply by virtue of being standardized (ie SQL 92)
second, if there is an error in the data structure, the easy to fix ones are my fault (i used the struct wrong), and the hard ones are someone elses. if i'm using say, some feature new to mysql 4.0.0alpha and it breaks, we just add it to the bug list and wait for 4.0.1 -- the fix is free, i or my employer don't have to pay for me to debug, rewrite, debug, etc...
if i were the guy writting mysql it would be a different story, but it's not like they write it in perl either...
| [reply] |
Re: Optimizing existing Perl code (in practise)
by derby (Abbot) on Aug 18, 2002 at 22:59 UTC
|
check out Effective Perl Programming - it's a great resource that shows you idiomatic perl which 4 times out of 5 is faster than the most of the other ways in TMTOWTDI.
-derby | [reply] |
Re: Optimizing existing Perl code (in practise)
by semio (Friar) on Aug 19, 2002 at 06:44 UTC
|
I found myself also asking this question based on some feedback I received from a recent question I posted -
converting hex to char. In this string, unpack and printf were presented as options for converting data. To test the performance for each, I did the following:
#!c:/perl/bin/perl -w
use strict;
use POSIX qw(strftime);
my $x;
my $maxint = 200000;
my $start = strftime "%H:%M:%S", localtime;
for ($x=0; $x <$maxint;$x++) {
print unpack "H*", "abc"
}
my $finish = strftime "%H:%M:%S", localtime;
print "$start $finish";
Results: 01:32:57 01:33:48 (51 seconds)
#!c:/perl/bin/perl -w
use strict;
use POSIX qw(strftime);
my $x;
my $maxint = 200000;
my $start = strftime "%H:%M:%S", localtime;
for ($x=0; $x <$maxint;$x++) {
printf "%x%x%x",ord('a'),ord('b'),ord('c');
}
my $finish = strftime "%H:%M:%S", localtime;
print "$start $finish";
Results: 01:31:56 01:32:50 (54 seconds)
In this case, unpack is the clear winner, although the performance difference doesn't become apparent until after 100000 iterations. So, in my opinion, being that TIMTOWTDI, I would look for a performance differential between these methods and opt for the one that requires the least amount of execution time.
The second thing I would check to see if any shelling out can be replaced by an available perl function. I recently wrote a program that required that the date/time stamps in a log file be updated. For this, I made the mistake of relying on shelling out
my $time1 = `date '+%H:%M:%S'`;
when I should have used
my $time1 = strftime "%H:%M:%S", localtime;
Hope this helps.
cheers, -semio
| [reply] [d/l] [select] |
|
use strict;
use Benchmark;
timethese(1500000,
{
'unpack' => 'unpack "H*", "abc"',
'sprintf' => 'sprintf "%x%x%x",ord("a"),ord("b"),ord("c
+")'
}
);
The Results:
Benchmark: timing 1500000 iterations of sprintf, unpack...
sprintf: 0 wallclock secs ( 0.17 usr + 0.00 sys = 0.17 CPU) @ 88
+23529.41/s (n=1500000)
(warning: too few iterations for a reliable count)
unpack: 10 wallclock secs ( 9.87 usr + 0.01 sys = 9.88 CPU) @ 15
+1821.86/s (n=1500000)
ACCCK!!!Abigail-II caught me in a latenight brain seizure. I shoulda been tipped off by sprintf winning. :( ++Abigail-II
grep
Mynd you, mønk bites Kan be pretti nasti... |
| [reply] [d/l] [select] |
|
$ perl -MO=Deparse -wce 'sprintf "%x%x%x", ord ("a"), ord ("b"), o
+rd ("c")'
Useless use of a constant in void context at -e line 1.
BEGIN { $^W = 1; }
'???';
-e syntax OK
$
Indeed, you just benchmarked how fast perl can do an empty loop.
Not very useful. Your benchmark should include assigning the result
to a variable. So, you might want to do:
#!/usr/bin/perl
use strict;
use warnings 'all';
use Benchmark;
timethese -10 => {
unpack => '$_ = unpack "H*" => "abc"',
sprintf => '$_ = sprintf "%x%x%x", ord ("a"), ord ("b"), ord (
+"c")',
}
__END__
Benchmark: running sprintf, unpack for at least 10 CPU seconds...
sprintf: 11 wallclock secs (10.25 usr + 0.00 sys = 10.25 CPU) @ 77
+5053.56/s (n=7944299)
unpack: 11 wallclock secs (10.48 usr + 0.01 sys = 10.49 CPU) @ 33
+1145.09/s (n=3473712)
It looks like sprintf is still a winner. But is it? Let's
check the deparser again:
$ perl -MO=Deparse -wce '$_ = sprintf "%x%x%x", ord "a", ord "b",
+ord "c"'
BEGIN { $^W = 1; }
$_ = '616263';
-e syntax OK
$
Oops. Perl is so smart, it figured out at compile time the result of the
sprintf. We'd have to make the arguments of sprintf
variable to make Perl actually do work at run time:
$ perl -MO=Deparse -wce '($a, $b, $c) = split // => "abc";
$_ = sprintf "%x%x%x", ord $a, ord $b, ord $c'
BEGIN { $^W = 1; }
($a, $b, $c) = split(//, 'abc', 4);
$_ = sprintf('%x%x%x', ord $a, ord $b, ord $c);
-e syntax OK
$
And only now we can run a fair benchmark:
#!/usr/bin/perl
use strict;
use warnings 'all';
use Benchmark;
use vars qw /$a $b $c $abc/;
$abc = "abc";
($a, $b, $c) = split // => $abc;
timethese -10 => {
unpack => '$_ = unpack "H*" => $::abc',
sprintf => '$_ = sprintf "%x%x%x", ord $::a, ord $::b, ord $::
+c',
}
__END__
Benchmark: running sprintf, unpack for at least 10 CPU seconds...
sprintf: 11 wallclock secs (10.51 usr + 0.01 sys = 10.52 CPU) @ 20
+8379.75/s (n=2192155)
unpack: 10 wallclock secs (10.10 usr + 0.00 sys = 10.10 CPU) @ 32
+3836.04/s (n=3270744)
And guess what? unpack is the winner!
The moral: no benchmark is better than a bad benchmark.
Abigail
| [reply] [d/l] [select] |
|
The second thing I would check to see if any shelling out can be replaced by an available perl function. I recently wrote a program that required that the date/time stamps in a log file be updated. For this, I made the mistake of relying on shelling out
This one particular piece of advice is very good. A peave of mine is when I see people who write Perl scripts and all the work in them is done by using system() calls. What is the point in writing a Perl script if you're not going to use the Perl functions? You might as well write the thing in shell.
Spawning system calls does take more resources and thus it behooves the Perl programmer to try and code the functionality they want using Perl built-ins and modules.
gj! ++ on this one.
_
_
_
_
_
_
_
_
_
_
- Jim
Insert clever comment here...
| [reply] |
|
A peave of mine is when I see people who write Perl scripts and all
the work in them is done by using system() calls. What is the point in
writing a Perl script if you're not going to use the Perl functions?
You might as well write the thing in shell.
And a "peave" of me is people who see everything black-and-white.
I've written Perl programs where the majority of the work was done
doing "system". What's the point of using a glue language, and not
glueing? You might as well write the thing in C.
Your point of view is quite opposite of the viewpoint of "code reuse".
Unix comes with a handy toolkit. There's nothing wrong with using it.
You might as well write the thing in shell.
Not always. Perl gives you more control flow syntax than a shell.
Spawning system calls does take more resources and thus it behooves the
Perl programmer to try and code the functionality they want using Perl
built-ins and modules.
Bull. Programming means making trade-offs between developer time and
run-time. The fact that you have choosen Perl instead of say, C, means
that you strongly favour developer time over run time. Your arguments make
sense if you are a C coder - but for a Perl coder they are just silly.
Really, what's the point of writing:
my $text = do {
open my $fh => $file or die "open: $!\n";
local $/;
<$fh>;
};
If you can just write:
my $text = `cat $file`;
Most programs won't read in gazillions of files in a single program, so
the extra overhead is minute. Far less than the sacrifice you already
made by using Perl instead of C. I also prefer
system mkdir => -p => $dir;
over the Perl equivalent. It takes to long to figure out which module
implemented it, and to download and install it.
Of course, making use of external programs makes you less portable,
but so does making use of modules not coming with the core. And many
programs dealing with file names aren't portable anyway. Do you always
use File::Spec when dealing with file names? I certainly don't.
I'm not claiming everything should be done with system.
Not at all. But I don't thing that everything that can be done in Perl
should, and that therefore system should be avoided.
Abigail
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
| [reply] [d/l] |
|
| [reply] [d/l] |
Re: Optimizing existing Perl code (in practise)
by gmpassos (Priest) on Aug 19, 2002 at 11:47 UTC
|
Well, if you really want to make some code faster make a XS, in other words make it in C. But this is only good to do with filters, crypters, etc...
To win speed, you can make tests of your code, specially inside loops, peaces that will be runned a lot of times, to find the best way to write it! Here are some tips:
Variables:
Don't use:
$var = $var . "add" ;
The best way is:
$var .= "add" ;
The first way (wrong) will rewrite all the variable in the memory, the second will only add the new data. Use the same idea for: += , -= , *= , /=
For subs use the content of the @_, specially for big data sent to the function. If you want speed use first the @_[0], then if you need to change the data inside @_[0], you use my ($var) = @_ ;, and if you have big data you use the "shift".
Don't use for big data:
sub {
my ($var1,$var2) = @_ ;
}
The best way is to use the @_[0] it self or the shift:
sub {
my $var1 = shift ;
my $var2 = shift ;
}
* If you use @_[?] you can't modifie it, you need to past to a $scalar.
If you have a loop (while,for,foreach) that will be runned a lot of times, try to not use the my inside it:
Normal way:
for(0..10) {
my $var = $_ ;
}
Faster:
my $var ;
for(0..10) {
$var = $_ ;
}
* Of course this will only improve speed if you try to make the my outside for all the variables, in other words for bigger codes inside the loop.
Don't use local(), my() is faster! The command local() in the begin of perl was used like my, but now it's only good if you want to make local *HANDLES, not variables.
Try to use the variables in this order: $scalar, @array, %hash. Some thimes we use %h or @a and they aren't needed, but they are more slower than $s and use more memory, specially %h!
About regular expressions (RE), use it only when it's needed! Dont make this: if($var =~ /x/) if you can do if($var eq 'x'). But some times RE can be faster than bigger codes, the best way to chose is test the 2 codes.
But always think that any tip here will improve some microseconds for you. Only spend time improving speed in the peaces of your code that really need! Always try to use the resources of core, don't remake things that can be made by Perl it self.
"The creativity is the expression of the liberty". | [reply] [d/l] [select] |
|
#!/usr/bin/perl -w
use strict;
use Benchmark qw(cmpthese);
sub shifter {
my $a=shift;
my $b=shift;
my $c=shift;
my $d=shift;
my $e=shift;
my $f=shift;
return $a*$b*$c*$d*$e*$f;
}
sub assigner {
my ($a,$b,$c,$d,$e,$f)=@_;
return $a*$b*$c*$d*$e*$f;
}
sub direct {
return $_[0]*$_[1]*$_[2]*$_[3]*$_[4]*$_[5];
}
cmpthese(-5,
{
'shifter' => sub {shifter(1,2,3,4,5,6);},
'assigner' => sub {assigner(1,2,3,4,5,6);},
'direct' => sub {direct(1,2,3,4,5,6);},
}
);
Results:
$ perl testSubs.pl
Benchmark: running assigner, direct, shifter, each for at least 2 CPU
+seconds...
assigner: 0 wallclock secs ( 2.06 usr + 0.02 sys = 2.08 CPU) @ 384
+577.33/s
(n=800690)
direct: 3 wallclock secs ( 2.04 usr + 0.00 sys = 2.04 CPU) @ 629
+222.22/s
(n=1285501)
shifter: 2 wallclock secs ( 2.09 usr + 0.00 sys = 2.09 CPU) @ 294
+563.31/s
(n=616521)
Rate shifter assigner direct
shifter 294563/s -- -23% -53%
assigner 384577/s 31% -- -39%
direct 629222/s 114% 64% --
That's with perl 5.6.1... Maybe 5.8.0 optimized shift? But you'd have to keep the old values around and have a "front" entry in the AV, and I don't remember seeing anything about that.
--
Mike | [reply] [d/l] [select] |
|
Hy,
The "shift" options is good to use when you send big data to the function! The process of the command is not fast, because it need to cut the value from the array, reorder the array, and create and save to a scalar variable! "shift" is good to use for big data because you don't leave in the memory the data 2 times! You just move to the scalar! If you want speed use first the @_[0], then if you need to change the data inside @_[0], you use my ($var) = @_ ;, and if you have big data you use the "shift".
"The creativity is the expression of the liberty".
| [reply] [d/l] [select] |
|
|
|
Re: Optimizing existing Perl code (in practise)
by JaWi (Hermit) on Aug 19, 2002 at 10:05 UTC
|
| [reply] |
Re: Optimizing existing Perl code (in practise)
by feloniousMonk (Pilgrim) on Aug 19, 2002 at 18:02 UTC
|
I definitely think benchmarking is the key answer here.
I think no matter what, this is an implementation specific problem. I always wrote Perl for
programmer speed, and paid less attention to execution speed. Until I started working on problems
that were big enough to deal with datasets ranging from hundreds of meg to a few gig in size.
I love Perl but for data this big, and the bit of processing required, I would have initially went
with either C or C++. BUT - I work in a place where most everyone knows Perl and not many know C/C++
so Perl optimization has become a big issue.
I've learned a lot about how slight code changes can increase efficiency, especially when
certain tasks need to be done many times over. I've seen major speed increases
just by benchmarking and trying a different solution, but keeping the same algorithm.
Things especially like my @a = ();
if ( $foo =~ /^(\d+)\s+(\w+)\s*$/ ) {
@a = ($1, $2);
}
vs. my @a = split (/\s+/, $foo);
Guess what? In my system, option #1 runs about 90% faster.
-felonious
-- | [reply] [d/l] [select] |
|
Those two code snippets are not at all similar in function, so
benchmarking them is useless.
| [reply] |
|
Um, they do perform the same function. They both place 2 variables into an array....
Yes, the method is different but what I intended to illustrate is that for a given set of data,
2 different methods of processing may have significant performance differences while giving the same results.
Also implicit in the code is that the solution will not work everywhere, which is why optimization depends on what
you intend on optimizing.
-felonious
--
| [reply] |
|
|
Re: Optimizing existing Perl code (in practise)
by thoglette (Scribe) on Aug 20, 2002 at 12:29 UTC
|
As others have said:
- Write it and optimise only if it needs it
- Get your algorithms right first
Ninety percent of your code will be fast enough - only certain
blocks may need tweeking.
Case in point - on a recent project with over 1/2Mbyte of script and about 400 'instances' two 'instances' ran far too slowly.
Most 'instances' ran in under 10 seconds while these two required 60 minutes, which was unacceptable.
An analysis (See comments on monitoring) showed that we had the following:
while(1)
{
$thing = new thing;
$thing->method(getc());
print $thing->result();
$thing->DESTROY;
}
All very well and good, but our class was heavily inherited and new
executed no less than 60 lines of code, including multiple function calls.
result went all the way up the tree to an AUTOLOAD handler. And all for a 10 line method
So, about 120 lines of code (and about 20 @INC function calls) to do 10 lines of code.
Time for some faster, locally optimised code AND VERY LOUD COMMENTS. Both in the local code and in the class which was being 'broken'.
Nett result was a run time about 10 seconds. Which was acceptable for this project.
--
Butlerian Jihad now! | [reply] |
Re: Optimizing existing Perl code (in practise)
by pingo (Hermit) on Aug 19, 2002 at 14:06 UTC
|
For my part, I don't do much optimizing. Instead, I rely on fastcgi to make my perl scripts fast enough (of course, this only applies to cgi). | [reply] |
|
|