Advice on make my programme faster

smilly has asked for the wisdom of the Perl Monks concerning the following question:

Dear Friends, I wrote a perl programme which is building a matrix but it works slow. indeed slower than the main module which is applied in. any suggestion to make it faster. really appreciate it.

#! /usr/local/bin/perl -w 

use strict;
use warnings;
use WordNet::QueryData;
use WordNet::Similarity::random;
use WordNet::Similarity::path;
use WordNet::Similarity::wup;
use WordNet::Similarity::lch;
use WordNet::Similarity::jcn;
use WordNet::Similarity::res;
use WordNet::Similarity::lin;
use WordNet::Similarity::hso;
use WordNet::Similarity::lesk;
use WordNet::Similarity::vector; 
use WordNet::Similarity::vector_pairs;
use Data::Dumper;

my $Infile = shift;
my $Outfile = shift;
my $Measure = shift;
my (@sim , $simi);

unless (defined $Infile and defined $Outfile and defined $Measure) {
 print STDERR "Undefined input\n";
 print STDERR "Usage: simmat.pl inputfile outputfile measure(WordNet::
+Similarity::path)\n";
 exit 1;
}
 print STDERR "Loading WordNet... \nLoading WordNet::QueryData... ";
my $wn = WordNet::QueryData->new;
die "Unable to create WordNet object.\n" if(!$wn);
print STDERR "done.\n";


open (INPUT, "$Infile") || die "can't open the input file";
chomp (my @words = <INPUT>);
close (INPUT) ;

for my $i (0 .. $#words) {
    for my $j ( ($i) .. $#words) {
        $sim[$i][$j] = similarity( $words[$i], $words[$j]);
        $sim[$j][$i] = $sim[$i][$j];
           }
}

sub similarity {
   my ( $w1, $w2 ) = @_;
   $simi = 1;
   my $obj = $Measure -> new($wn);
   my $simi = $obj-> getRelatedness("$w1", "$w2");
   return $simi;
   }

open (OUTPUT, ">$Outfile");
print OUTPUT Dumper(\@sim);
close(OUTPUT);
[download]

Comment on Advice on make my programme faster Download Code

Replies are listed 'Best First'.
Re: Advice on make my programme faster by suaveant (Parson) on Feb 11, 2008 at 18:38 UTC
Well, if similarity gets called often with the same arguments then Memoize could help, caching values to prevent recalculation... Another thing to look into is Devel::SmallProf which shows you how many times a line is run and how long your program spends running it, to help you find slowdowns. You may also find a speedup if you can manage to create `$Measure->new($wn)` only once instead of at each call, but I'm not sure if the module will allow that, not being familiar with it. - Ant - Some of my best work - (1 2 3)	[reply] [d/l]
Re: Advice on make my programme faster by kyle (Abbot) on Feb 11, 2008 at 19:07 UTC
Have a look at Profiling your code. Profiling will tell you what parts of your program use the most time, and you can focus your efforts there. If you find the results of profiling confusing, we can help with that.	[reply]
Re: Advice on make my programme faster by roboticus (Chancellor) on Feb 11, 2008 at 21:51 UTC
smilly: You may be solving the wrong problem altogether. If your word list is very large (as I suspect it is), you'll be calculating a *huge* set of values. For many problems, you won't need the entire matrix. It may be better to compute them as needed and store them in the array when computed. That way, you compute only the ones you want, and once only. Something like: `sub get_similarity { my $i = shift; my $j = shift; # Matrix is symmetric around x==y ($i,$j) = ($j,$i) if $i > $j; return $sim[$i][$j] if defined $sim[$i][$j]; $sim[$i][$j] = similarity($words[$i], $words[$j]); }` [download] Extra credit if you: Use a hash instead of an array. Tie your array or hash to a dbm file so you keep your values between runs... ...roboticus	[reply] [d/l]
Re: Advice on make my programme faster by lima1 (Curate) on Feb 11, 2008 at 20:44 UTC
move `my $obj = $Measure->new($wn);` [download] before the for loop (Update: suaveant already mentioned this, sorry). Check your word list. Does it contain duplicated lines (you should never have to use Memoize)? Try this to spot errors: `sub similarity { my ( $w1, $w2 ) = @_; warn "'$w1' vs '$w2'"; return $obj->getRelatedness($w1, $w2); }` [download]	[reply] [d/l] [select]
Re^2: Advice on make my programme faster by smilly (Novice) on Feb 11, 2008 at 21:27 UTC
thanks good advice for moving $obj from the loop.	[reply]
Re: Advice on make my programme faster by downer (Monk) on Feb 12, 2008 at 05:03 UTC
if wordnet similarity is symmetric (i suspect it is) then you only need to compute half your matrix ( j = 1 - (# words-1), i = j+1 - # words ). additionally, because language follows the power law, it stands to chance that you are comparing some pairs of words more often. maybe you can keep a hash to check if you've seen a pair of words before, if so, you dont need to look it up. this? i dont know if this will gain you much, but maybe worth a try. this is what i can think of for now. try profiling. that usually gives you an idea what to start with. ~downer	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks