Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Advice on make my programme faster

by smilly (Novice)
on Feb 11, 2008 at 17:56 UTC ( [id://667419]=perlquestion: print w/replies, xml ) Need Help??

smilly has asked for the wisdom of the Perl Monks concerning the following question:

Dear Friends, I wrote a perl programme which is building a matrix but it works slow. indeed slower than the main module which is applied in. any suggestion to make it faster. really appreciate it.
#! /usr/local/bin/perl -w use strict; use warnings; use WordNet::QueryData; use WordNet::Similarity::random; use WordNet::Similarity::path; use WordNet::Similarity::wup; use WordNet::Similarity::lch; use WordNet::Similarity::jcn; use WordNet::Similarity::res; use WordNet::Similarity::lin; use WordNet::Similarity::hso; use WordNet::Similarity::lesk; use WordNet::Similarity::vector; use WordNet::Similarity::vector_pairs; use Data::Dumper; my $Infile = shift; my $Outfile = shift; my $Measure = shift; my (@sim , $simi); unless (defined $Infile and defined $Outfile and defined $Measure) { print STDERR "Undefined input\n"; print STDERR "Usage: simmat.pl inputfile outputfile measure(WordNet:: +Similarity::path)\n"; exit 1; } print STDERR "Loading WordNet... \nLoading WordNet::QueryData... "; my $wn = WordNet::QueryData->new; die "Unable to create WordNet object.\n" if(!$wn); print STDERR "done.\n"; open (INPUT, "$Infile") || die "can't open the input file"; chomp (my @words = <INPUT>); close (INPUT) ; for my $i (0 .. $#words) { for my $j ( ($i) .. $#words) { $sim[$i][$j] = similarity( $words[$i], $words[$j]); $sim[$j][$i] = $sim[$i][$j]; } } sub similarity { my ( $w1, $w2 ) = @_; $simi = 1; my $obj = $Measure -> new($wn); my $simi = $obj-> getRelatedness("$w1", "$w2"); return $simi; } open (OUTPUT, ">$Outfile"); print OUTPUT Dumper(\@sim); close(OUTPUT);

Replies are listed 'Best First'.
Re: Advice on make my programme faster
by suaveant (Parson) on Feb 11, 2008 at 18:38 UTC
    Well, if similarity gets called often with the same arguments then Memoize could help, caching values to prevent recalculation...

    Another thing to look into is Devel::SmallProf which shows you how many times a line is run and how long your program spends running it, to help you find slowdowns.

    You may also find a speedup if you can manage to create $Measure->new($wn) only once instead of at each call, but I'm not sure if the module will allow that, not being familiar with it.

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Advice on make my programme faster
by kyle (Abbot) on Feb 11, 2008 at 19:07 UTC

    Have a look at Profiling your code. Profiling will tell you what parts of your program use the most time, and you can focus your efforts there. If you find the results of profiling confusing, we can help with that.

Re: Advice on make my programme faster
by roboticus (Chancellor) on Feb 11, 2008 at 21:51 UTC
    smilly:

    You may be solving the wrong problem altogether. If your word list is very large (as I suspect it is), you'll be calculating a huge set of values. For many problems, you won't need the entire matrix. It may be better to compute them as needed and store them in the array when computed. That way, you compute only the ones you want, and once only. Something like:

    sub get_similarity { my $i = shift; my $j = shift; # Matrix is symmetric around x==y ($i,$j) = ($j,$i) if $i > $j; return $sim[$i][$j] if defined $sim[$i][$j]; $sim[$i][$j] = similarity($words[$i], $words[$j]); }

    Extra credit if you:

    • Use a hash instead of an array.
    • Tie your array or hash to a dbm file so you keep your values between runs...
    ...roboticus
Re: Advice on make my programme faster
by lima1 (Curate) on Feb 11, 2008 at 20:44 UTC
    move
    my $obj = $Measure->new($wn);
    before the for loop (Update: suaveant already mentioned this, sorry).

    Check your word list. Does it contain duplicated lines (you should never have to use Memoize)? Try this to spot errors:

    sub similarity { my ( $w1, $w2 ) = @_; warn "'$w1' vs '$w2'"; return $obj->getRelatedness($w1, $w2); }
      thanks good advice for moving $obj from the loop.
Re: Advice on make my programme faster
by downer (Monk) on Feb 12, 2008 at 05:03 UTC
    if wordnet similarity is symmetric (i suspect it is) then you only need to compute half your matrix ( j = 1 - (# words-1), i = j+1 - # words ). additionally, because language follows the power law, it stands to chance that you are comparing some pairs of words more often. maybe you can keep a hash to check if you've seen a pair of words before, if so, you dont need to look it up. this? i dont know if this will gain you much, but maybe worth a try.

    this is what i can think of for now. try profiling. that usually gives you an idea what to start with.

    ~downer

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://667419]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-24 13:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found