Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Question about benchmarking

by jryan (Vicar)
on Aug 17, 2001 at 01:55 UTC ( [id://105540]=perlquestion: print w/replies, xml ) Need Help??

jryan has asked for the wisdom of the Perl Monks concerning the following question:

I have a question about how to benchmark code.

The Background Info:
I have recently created a weblog analyser for my high school. As a feature for it, I have created a "hierarchy view" so the webmaster can easily browse his way through the site, finding the amount of hits a certain directory/file/etc has gotten. You can see it here. Click on a side caret (or as some like to call it, a "greater than symbol") link at the end of a line to expand a directory, or click the plus sign to add to a watched link (don't do that, or if you do delete it. Its only for the webmaster).

The Question:
Sometimes, this "hierarchy view" runs kinda slow. It seems to run slower for larger directories; however, that doesn't make any sense since the script still has to sort through the same amount of data for a smaller directory. I would like to know where my script is hanging up so I can possibly fix it, or find out if it is just the browser taking awhile to show the information. Is there any way to do this?

Thanks!

Replies are listed 'Best First'.
Re: Question about benchmarking
by kschwab (Vicar) on Aug 17, 2001 at 02:02 UTC
    You could:
    • Post the code, and let us help
    • Try running it under the debugger
    • Have a look at Devl::Dprof, a perl profiler
    I'm guessing you've already taken a step back and reviewed the code, following the code paths and loops ?
Re: Question about benchmarking
by perrin (Chancellor) on Aug 17, 2001 at 02:29 UTC
    As mentioned above, Devel::DProf is the best solution for finding things in your code that are slowing it down. However, if that looks too daunting to you, just use Time::HiRes with some print statements to tell you how long a particular section of code took. That will also allow you to time the full execution of your script, to determine if it's the script or the browser that's slowing you down.
Re: Question about benchmarking
by LD2 (Curate) on Aug 17, 2001 at 02:37 UTC
    Along with all the other suggestions, you can also use Devel::SmallProf - which is a per line Perl profiler.
Re: Question about benchmarking
by jryan (Vicar) on Aug 17, 2001 at 04:04 UTC

    Thanks for the compliment, this is my first "big" perl project :). The code is still at work (I forgot to bring it home), but I'll remember to get it tomarrow and post it.
    Just a few questions:

    1. What exactly does Devel::DProf do? I couldn't make sense of any of the output that the sample was showing. It seemed scary. :( Devel::SmallProf looked better, but not much.
    2. Time::HiRes seemed closer to what I need, and I think I'll try that out.
    3. In a perl book that I have at home, I came across a built in Benchmark module, but it seemed kinda odd. Does anyone know anything about it? Its useage was something like:
      use Benchmark; $start=new Benchmark; run_around_in_circle(1000); $end=new Benchmark; print timestr(timediff($end, $start));
      That seems way too simple too work, I don't trust it. Also, do you think that Benchmark or Time::HiRes slow my program down significantly?

    Thanks again.

      There is an example of using the Benchmark module to compare different algorithms at this node on searching for an array index.

      One difficult part of benchmarking is keeping track of what you have changed, and what effect these changes had. Here is one way to do it:

      First, isolate the part of your code that needs improvement in a subroutine. Then, create new subroutines with different names to try different approaches. Preserve the different implementations of the subroutine in the program. This ensures that you don't accidentally change something else in your code that ruins your comparison data. The Benchmark node mentioned above shows an example of this approach.

      If you keep your subroutine short, it is easy for other people, such as Perl Monks, to help speed up your code.

      It should work perfectly the first time! - toma

      Answer to number 3: run (under windows)
      perldoc Benchmark
      to read the POD for the module. If you have a somewhat old and/or broken install (like me), you can always fall back to
      vi \perl\lib\Benchmark.pm
      :)

      What you've written is a proper way of using benchmark. Ideally, you would insert several objects into your code to get the times you wanted, but you can do it in a clever and inobtrusive manner:

      BEGIN { use Benchmark; my ($oldb, $newb); # inside BEGIN for closure sub mybench { return unless ($opt_b); $newb = new Benchmark; if ($oldb) { # i.e. don't run first time through print timediff($oldb, $newb); } $oldb = $newb; } } Getopt::Std; getopts('b'); mybench(); # some code here mybench(); # some more code here mybench();
      and then it would be trivial to remove the benchmarking from your production code, either through a command line switch, as above:
      mycode.pl -b
      or some line in your makefile like:
      perl -lne 'print unless /^\s*mybench();/' myscript.pl > production.pl
      ya know.
      To understand DProf, look at the man page for dprofpp, which is included in the distribution. The output of Devel::DProf is not meant to be looked at by humans -- you're supposed to run dprofpp on it.

      You don't trust it because it's simple?!

      Well, I guess you still don't have that much experience, but here's a universal truth: the best code is usually the most simple. Usually.

      Anyways, I don't know exactly how your program works, but I would first suggest to use you friend, Devel::Dprof.

      ( I'm not sure what you didn't understand about the output from Devel::Dprof, but it's pretty straight forward, I think.... It's just shows in order which subroutines were accessed most frequently )

      Thanks for that information about benchmarking, that is exactly what I needed. Btw, what I meant by "I don't trust it because it is too simple" is that it looked like too simple of a solution to what looked to be a complex problem, but really wasn't.

      For those who still want to see the code, here it is (I've removed all of the stuff that doesn't have to do with the hierarchy part, to cut down on confusion):

      #!/usr/bin/perl # *********************** # Name: hierarchy.pl # Author: Joe Ryan # Date Finished: August 10th, 2001 # Where Used: http://amherst.k12.oh.us/cgi-bin/weblog/hierarchy.pl # Description: An extension to mainlog.pl that lets the # user browse through the amherst website, showing the hits # per directory and per file. Note that the script # doesn't directly run off the weblog, but rather a text # file that is created after processing the weblog. open(DATA, 'pagehits.txt'); @mlines = <DATA>; close(DATA); $cutat = 2; # recursive function that prints the hierarchy sub print_hierarchy { my ($newlevelref, $newhitsref, $x, $stophere, $plvars, $temp, $tem +p1, $resume) = @_; my (@newlevel) = @$newlevelref; my (@newhits) = @$newhitsref; if ($x<=$cutlevelat) { print "<ul>\n"; $spaces = "<li>"; $spacesend = "</li>"; my ($y)=0; for ($y=0; $y < @newlevel; $y++) { my ($newleveltemp) = $newlevel[$y]; my ($newhitstemp) = $newhits[$y]; my ($temp2) = $temp.$newleveltemp; my ($temp3) = $temp1.$newleveltemp; my ($p_count) = ($newleveltemp =~ tr/\.//); if ($p_count < 1) {$temp2.="/"; $temp3.="/";} eval ("\$plvars1 = \$plvars\.\"&pathlevel".$x."=\$newlevel +temp\""); print "$spaces $newhitstemp&nbsp;<a href=\"$temp2\">$newle +veltemp</a>&nbsp;<a href=\"/cgi-bin/weblog/hierarchy.pl?&cutlevelat=" +.($x+1)."$plvars1\">></a>&nbsp;<a href=\"/cgi-bin/weblog/hierarchy.pl +?addwatch=$temp3&addhits=$newhitstemp&cutlevelat=".($x+1)."$plvars\"> ++</a>$spacesend\n" unless ($newleveltemp =~ /\./ || $newleveltemp eq +""); if ($y == $stophere) { eval ("print_hierarchy(\\\@newlevel".($x).", \\\@newhi +ts".($x).", ".($x+1).", \$stophere".($x+1).", \$plvars1, \$temp2, \$t +emp3, 1)"); } } print "$spaces<b>--------------</b>$spacesend\n"; for ($y=0; $y < @newlevel; $y++) { my($newleveltemp) = $newlevel[$y]; my($newhitstemp) = $newhits[$y]; my($temp2) = $temp.$newleveltemp; my($temp3) = $temp1.$newleveltemp; eval ("\$plvars1 = \$plvars\.\"&pathlevel1=\$newleveltemp\ +""); my($p_count) = ($newleveltemp =~ tr/\.//); if ($p_count < 1) {$temp2.="/"; $temp3.="/";} print "$spaces $newhitstemp <a href=\"$temp2\">$newlevelte +mp</a>"."&nbsp;<a href=\"/cgi-bin/weblog/hierarchy.pl?addwatch=$temp3 +&addhits=$newhitstemp&cutlevelat=0$plvars\">+</a>\n" if ($newleveltem +p =~ /\./ && $newleveltemp ne ""); } print "</ul>\n"; } else { return 1; } } use CGI; $query = CGI::new(); # $cutlevelat is how many levels out a directory is from the base $cutlevelat = $query->param("cutlevelat"); $cutlevelat=0 if (!$cutlevelat); for ($i=0; $i<$cutlevelat; $i++) { eval ("\$pathlevel".$i." = \$query->param(\"pathlevel".$i."\")"); } print "Content-type: text/html\n\n<html><head><title>Hierarchy View</t +itle></head><body bgcolor=\"#FFFFFF\">\n"; @mypath = (""); @myhits = (0); $y=0; $v=0; for ($i=0; $i<$cutlevelat; $i++) { eval("\$n".$i."=0"); } for($i=0; $i<@mlines; $i++) { @entry = split (' ', $mlines[$i]); @path = split (/\//, $entry[0]); $temp=""; $x=1; for (; $x<=$cutat;$x++) { $path1[$x-1]=$path[$x-1]; $temp.=$path[$x-1]; $temp.= "/"; } $breakcheck=1; for ($c=0; $c<$cutlevelat; $c++) { $breakcheck=1; eval("\$pathlevel=\$pathlevel".$c); if ($path[$#path1+$c] eq $pathlevel) { eval("\$meep=\@newlevel".$c); for($z=0; $z<$meep; $z++) { eval ("\$narf=\$newlevel".$c."[\$z]"); if ($path[$#path1+1+$c] eq $narf) { eval ("\$newhits".$c."[\$z]+=\$entry[1]"); $breakcheck=0; } } if ($breakcheck) { eval("\$newlevel".$c."[\$n".$c."] = \$path[\$\#path1+1 ++".$c."]"); eval("\$newhits".$c."[\$n".$c."] = \$entry[1]"); eval("\$n".$c."++"); } } } $entry[0]=$temp; $breakcheck=1; for($x=0; $x<@mypath; $x++) { if ($entry[0] eq $mypath[$x]) { $myhits[$x]+=$entry[1]; $breakcheck=0; } } if ($breakcheck) { $mypath[$t]=$entry[0]; $myhits[$t]=$entry[1]; $t++; } } @indices = (0 .. $#myhits); @sorted_indices = sort {$myhits[$b] <=> $myhits[$a]} @indices; @myhits = @myhits[@sorted_indices]; @mypath = @mypath[@sorted_indices]; for ($i=0; $i<$cutlevelat; $i++) { eval("\@indices = (0 .. \$\#newhits".$i.")"); eval("\@sorted_indices = sort {\$newhits".$i."[\$b] <=> \$newhits" +.$i."[\$a]} \@indices"); eval("\@newhits".$i." = \@newhits".$i."[\@sorted_indices]"); eval("\@newlevel".$i." = \@newlevel".$i."[\@sorted_indices]"); } # find where the next level of the hierarchy is supposed to go for ($i=0; $i<$cutlevelat-1; $i++) { eval ("\$stophere".($i+1)."=0"); eval ("\$meep=\@newlevel".$i); eval ("\$pathlevel=\$pathlevel".($i+1)); for ($x=0; $x<$meep; $x++) { eval ("\$narf=\$newlevel".($i)."[\$x]"); if ($narf eq $pathlevel) { eval ("\$stophere".($i+1)."=\$x"); } } } print "<h1>Individual Page Hits</h1>\n<table border=\"1\" cellpadding= +\"5\">\n"; for($i=0; $i<@mypath; $i++) { print "<tr><td>\n"; my (@path) = split (/\//, $mypath[$i]); print $myhits[$i]."&nbsp;"; print "<a href=\"http://www.amherst.k12.oh.us/\">Amherst Steele</a +>"; $temp = "http://www.amherst.k12.oh.us/"; $temp1 = "/"; if ($path[1] eq ""){$stop = 1;} $x=1; for ($x=1; $x < @path && $x < $cutat && $x != $stop; $x++) { print "&nbsp;>&nbsp;"; $temp .= $path[$x]; $temp1 .= $path[$x]; $p_count = ($path[$x] =~ tr/\.//); if ($p_count < 1) {$temp.="/"; $temp1.="/";} print "<a href=\"".$temp."\">".$path[$x]."</a>"; } $plvars = "&pathlevel0=$path[$#path]"; $endpath = $path[$#path]; if ($path[$#path] eq $pathlevel0) { print_hierarchy(\@newlevel0, \@newhits0, 1, $stophere1, $plvar +s, $temp, $temp1, $resume); } $p_count = ($endpath =~ /\.//); if ($p_count < 1) { print "&nbsp;<a href=\"/cgi-bin/weblog/hierarchy.pl?&cutlevela +t=1$plvars\">></a>&nbsp;"; } $stop=0; print "&nbsp;<a href=\"/cgi-bin/weblog/hierarchy.pl?addwatch=$temp +1&addhits=$myhits[$i]&cutat=$cutat\">+</a>"; print "\n</td></tr>\n"; } print "</table></body></html>";

      I'm sure I can scope a lot of these variables a little better, and I'm sure that their are some more efficient ways to do some of the things I was doing (via a built-in function, etc.). If anyone sees anything at all to help sprouse it up, by all means tell me!

Re: Question about benchmarking
by jepri (Parson) on Aug 17, 2001 at 02:23 UTC
    Sweet program. I have no help beyond "maybe the recursion is slowing you down"

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://105540]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-25 21:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found