jryan has asked for the wisdom of the Perl Monks concerning the following question:
I have a question about how to benchmark code.
The Background Info:
I have recently created a weblog analyser for my high school. As a feature for it, I have created a "hierarchy view" so the webmaster can easily browse his way through the site, finding the amount of hits a certain directory/file/etc has gotten. You can see it here. Click on a side caret (or as some like to call it, a "greater than symbol") link at the end of a line to expand a directory, or click the plus sign to add to a watched link (don't do that, or if you do delete it. Its only for the webmaster).
The Question:
Sometimes, this "hierarchy view" runs kinda slow. It seems to run slower for larger directories; however, that doesn't make any sense since the script still has to sort through the same amount of data for a smaller directory. I would like to know where my script is hanging up so I can possibly fix it, or find out if it is just the browser taking awhile to show the information. Is there any way to do this?
Thanks!
Re: Question about benchmarking
by kschwab (Vicar) on Aug 17, 2001 at 02:02 UTC
|
You could:
- Post the code, and let us help
- Try running it under the debugger
- Have a look at Devl::Dprof, a perl
profiler
I'm guessing you've already taken a step back and reviewed
the code, following the code paths and loops ? | [reply] |
Re: Question about benchmarking
by perrin (Chancellor) on Aug 17, 2001 at 02:29 UTC
|
As mentioned above, Devel::DProf is the best solution for finding things in your code that are slowing it down. However, if that looks too daunting to you, just use Time::HiRes with some print statements to tell you how long a particular section of code took. That will also allow you to time the full execution of your script, to determine if it's the script or the browser that's slowing you down. | [reply] |
Re: Question about benchmarking
by LD2 (Curate) on Aug 17, 2001 at 02:37 UTC
|
Along with all the other suggestions, you can also use Devel::SmallProf - which is a per line Perl profiler.
| [reply] |
Re: Question about benchmarking
by jryan (Vicar) on Aug 17, 2001 at 04:04 UTC
|
Thanks for the compliment, this is my first "big" perl project :). The code is still at work (I forgot to bring it home), but I'll remember to get it tomarrow and post it. Just a few questions:
- What exactly does Devel::DProf do? I couldn't make sense of any of the output that the sample was showing. It seemed scary. :( Devel::SmallProf looked better, but not much.
- Time::HiRes seemed closer to what I need, and I think I'll try that out.
- In a perl book that I have at home, I came across a built in Benchmark module, but it seemed kinda odd. Does anyone know anything about it? Its useage was something like:
use Benchmark;
$start=new Benchmark;
run_around_in_circle(1000);
$end=new Benchmark;
print timestr(timediff($end, $start));
That seems way too simple too work, I don't trust it. Also, do you think that Benchmark or Time::HiRes slow my program down significantly?
Thanks again.
| [reply] [d/l] |
|
There is an example of using the Benchmark module
to compare different algorithms at
this node on
searching for an array index.
One difficult part of benchmarking
is keeping track of what you have changed, and what
effect these changes had. Here is one way to do it:
First, isolate the part of your code that needs
improvement in a subroutine.
Then, create new subroutines with different names
to try different approaches.
Preserve the different implementations of the subroutine
in the program. This ensures that you don't accidentally
change something else in your code that ruins your
comparison data. The Benchmark node mentioned above
shows an example of this approach.
If you keep your subroutine short, it is easy for
other people, such as Perl Monks, to help
speed up your code.
It should work perfectly the first time! - toma
| [reply] |
|
Answer to number 3: run (under windows)
perldoc Benchmark
to read the POD for the module. If you have a somewhat old and/or broken install (like me), you can always fall back to
vi \perl\lib\Benchmark.pm
:)
What you've written is a proper way of using benchmark. Ideally, you would insert several objects into your code to get the times you wanted, but you can do it in a clever and inobtrusive manner:
BEGIN {
use Benchmark;
my ($oldb, $newb); # inside BEGIN for closure
sub mybench {
return unless ($opt_b);
$newb = new Benchmark;
if ($oldb) { # i.e. don't run first time through
print timediff($oldb, $newb);
}
$oldb = $newb;
}
}
Getopt::Std;
getopts('b');
mybench();
# some code here
mybench();
# some more code here
mybench();
and then it would be trivial to remove the benchmarking from your production code, either through a command line switch, as above:
mycode.pl -b
or some line in your makefile like:
perl -lne 'print unless /^\s*mybench();/' myscript.pl > production.pl
ya know.
| [reply] [d/l] [select] |
|
To understand DProf, look at the man page for dprofpp, which is included in the distribution. The output of Devel::DProf is not meant to be looked at by humans -- you're supposed to run dprofpp on it.
| [reply] |
|
You don't trust it because it's simple?!
Well, I guess you still don't have that much experience, but here's a universal truth: the best code is usually the most simple. Usually.
Anyways, I don't know exactly how your program works, but I would first suggest to use you friend, Devel::Dprof.
( I'm not sure what you didn't understand about the output from Devel::Dprof, but it's pretty straight forward, I think.... It's just shows in order which subroutines were accessed most frequently )
| [reply] |
|
Thanks for that information about benchmarking, that is exactly what I needed. Btw, what I meant by "I don't trust it because it is too simple" is that it looked like too simple of a solution to what looked to be a complex problem, but really wasn't.
For those who still want to see the code, here it is (I've removed all of the stuff that doesn't have to do with the hierarchy part, to cut down on confusion):
#!/usr/bin/perl
# ***********************
# Name: hierarchy.pl
# Author: Joe Ryan
# Date Finished: August 10th, 2001
# Where Used: http://amherst.k12.oh.us/cgi-bin/weblog/hierarchy.pl
# Description: An extension to mainlog.pl that lets the
# user browse through the amherst website, showing the hits
# per directory and per file. Note that the script
# doesn't directly run off the weblog, but rather a text
# file that is created after processing the weblog.
open(DATA, 'pagehits.txt');
@mlines = <DATA>;
close(DATA);
$cutat = 2;
# recursive function that prints the hierarchy
sub print_hierarchy
{
my ($newlevelref, $newhitsref, $x, $stophere, $plvars, $temp, $tem
+p1, $resume) = @_;
my (@newlevel) = @$newlevelref;
my (@newhits) = @$newhitsref;
if ($x<=$cutlevelat)
{
print "<ul>\n";
$spaces = "<li>";
$spacesend = "</li>";
my ($y)=0;
for ($y=0; $y < @newlevel; $y++)
{
my ($newleveltemp) = $newlevel[$y];
my ($newhitstemp) = $newhits[$y];
my ($temp2) = $temp.$newleveltemp;
my ($temp3) = $temp1.$newleveltemp;
my ($p_count) = ($newleveltemp =~ tr/\.//);
if ($p_count < 1) {$temp2.="/"; $temp3.="/";}
eval ("\$plvars1 = \$plvars\.\"&pathlevel".$x."=\$newlevel
+temp\"");
print "$spaces $newhitstemp <a href=\"$temp2\">$newle
+veltemp</a> <a href=\"/cgi-bin/weblog/hierarchy.pl?&cutlevelat="
+.($x+1)."$plvars1\">></a> <a href=\"/cgi-bin/weblog/hierarchy.pl
+?addwatch=$temp3&addhits=$newhitstemp&cutlevelat=".($x+1)."$plvars\">
++</a>$spacesend\n" unless ($newleveltemp =~ /\./ || $newleveltemp eq
+"");
if ($y == $stophere)
{
eval ("print_hierarchy(\\\@newlevel".($x).", \\\@newhi
+ts".($x).", ".($x+1).", \$stophere".($x+1).", \$plvars1, \$temp2, \$t
+emp3, 1)");
}
}
print "$spaces<b>--------------</b>$spacesend\n";
for ($y=0; $y < @newlevel; $y++)
{
my($newleveltemp) = $newlevel[$y];
my($newhitstemp) = $newhits[$y];
my($temp2) = $temp.$newleveltemp;
my($temp3) = $temp1.$newleveltemp;
eval ("\$plvars1 = \$plvars\.\"&pathlevel1=\$newleveltemp\
+"");
my($p_count) = ($newleveltemp =~ tr/\.//);
if ($p_count < 1) {$temp2.="/"; $temp3.="/";}
print "$spaces $newhitstemp <a href=\"$temp2\">$newlevelte
+mp</a>"." <a href=\"/cgi-bin/weblog/hierarchy.pl?addwatch=$temp3
+&addhits=$newhitstemp&cutlevelat=0$plvars\">+</a>\n" if ($newleveltem
+p =~ /\./ && $newleveltemp ne "");
}
print "</ul>\n";
}
else { return 1; }
}
use CGI;
$query = CGI::new();
# $cutlevelat is how many levels out a directory is from the base
$cutlevelat = $query->param("cutlevelat");
$cutlevelat=0 if (!$cutlevelat);
for ($i=0; $i<$cutlevelat; $i++)
{
eval ("\$pathlevel".$i." = \$query->param(\"pathlevel".$i."\")");
}
print "Content-type: text/html\n\n<html><head><title>Hierarchy View</t
+itle></head><body bgcolor=\"#FFFFFF\">\n";
@mypath = ("");
@myhits = (0);
$y=0;
$v=0;
for ($i=0; $i<$cutlevelat; $i++)
{
eval("\$n".$i."=0");
}
for($i=0; $i<@mlines; $i++)
{
@entry = split (' ', $mlines[$i]);
@path = split (/\//, $entry[0]);
$temp="";
$x=1;
for (; $x<=$cutat;$x++)
{
$path1[$x-1]=$path[$x-1];
$temp.=$path[$x-1];
$temp.= "/";
}
$breakcheck=1;
for ($c=0; $c<$cutlevelat; $c++)
{
$breakcheck=1;
eval("\$pathlevel=\$pathlevel".$c);
if ($path[$#path1+$c] eq $pathlevel)
{
eval("\$meep=\@newlevel".$c);
for($z=0; $z<$meep; $z++)
{
eval ("\$narf=\$newlevel".$c."[\$z]");
if ($path[$#path1+1+$c] eq $narf)
{
eval ("\$newhits".$c."[\$z]+=\$entry[1]");
$breakcheck=0;
}
}
if ($breakcheck)
{
eval("\$newlevel".$c."[\$n".$c."] = \$path[\$\#path1+1
++".$c."]");
eval("\$newhits".$c."[\$n".$c."] = \$entry[1]");
eval("\$n".$c."++");
}
}
}
$entry[0]=$temp;
$breakcheck=1;
for($x=0; $x<@mypath; $x++)
{
if ($entry[0] eq $mypath[$x])
{
$myhits[$x]+=$entry[1];
$breakcheck=0;
}
}
if ($breakcheck)
{
$mypath[$t]=$entry[0];
$myhits[$t]=$entry[1];
$t++;
}
}
@indices = (0 .. $#myhits);
@sorted_indices = sort {$myhits[$b] <=> $myhits[$a]} @indices;
@myhits = @myhits[@sorted_indices];
@mypath = @mypath[@sorted_indices];
for ($i=0; $i<$cutlevelat; $i++)
{
eval("\@indices = (0 .. \$\#newhits".$i.")");
eval("\@sorted_indices = sort {\$newhits".$i."[\$b] <=> \$newhits"
+.$i."[\$a]} \@indices");
eval("\@newhits".$i." = \@newhits".$i."[\@sorted_indices]");
eval("\@newlevel".$i." = \@newlevel".$i."[\@sorted_indices]");
}
# find where the next level of the hierarchy is supposed to go
for ($i=0; $i<$cutlevelat-1; $i++)
{
eval ("\$stophere".($i+1)."=0");
eval ("\$meep=\@newlevel".$i);
eval ("\$pathlevel=\$pathlevel".($i+1));
for ($x=0; $x<$meep; $x++)
{
eval ("\$narf=\$newlevel".($i)."[\$x]");
if ($narf eq $pathlevel)
{
eval ("\$stophere".($i+1)."=\$x");
}
}
}
print "<h1>Individual Page Hits</h1>\n<table border=\"1\" cellpadding=
+\"5\">\n";
for($i=0; $i<@mypath; $i++)
{
print "<tr><td>\n";
my (@path) = split (/\//, $mypath[$i]);
print $myhits[$i]." ";
print "<a href=\"http://www.amherst.k12.oh.us/\">Amherst Steele</a
+>";
$temp = "http://www.amherst.k12.oh.us/";
$temp1 = "/";
if ($path[1] eq ""){$stop = 1;}
$x=1;
for ($x=1; $x < @path && $x < $cutat && $x != $stop; $x++)
{
print " > ";
$temp .= $path[$x];
$temp1 .= $path[$x];
$p_count = ($path[$x] =~ tr/\.//);
if ($p_count < 1) {$temp.="/"; $temp1.="/";}
print "<a href=\"".$temp."\">".$path[$x]."</a>";
}
$plvars = "&pathlevel0=$path[$#path]";
$endpath = $path[$#path];
if ($path[$#path] eq $pathlevel0)
{
print_hierarchy(\@newlevel0, \@newhits0, 1, $stophere1, $plvar
+s, $temp, $temp1, $resume);
}
$p_count = ($endpath =~ /\.//);
if ($p_count < 1)
{
print " <a href=\"/cgi-bin/weblog/hierarchy.pl?&cutlevela
+t=1$plvars\">></a> ";
}
$stop=0;
print " <a href=\"/cgi-bin/weblog/hierarchy.pl?addwatch=$temp
+1&addhits=$myhits[$i]&cutat=$cutat\">+</a>";
print "\n</td></tr>\n";
}
print "</table></body></html>";
I'm sure I can scope a lot of these variables a little better, and I'm sure that their are some more efficient ways to do some of the things I was doing (via a built-in function, etc.). If anyone sees anything at all to help sprouse it up, by all means tell me! | [reply] [d/l] |
Re: Question about benchmarking
by jepri (Parson) on Aug 17, 2001 at 02:23 UTC
|
| [reply] |
|
|