Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Similarity of strings

by VSarkiss (Monsignor)
on May 15, 2002 at 14:52 UTC ( [id://166733]=note: print w/replies, xml ) Need Help??


in reply to Similarity of strings

I'm not sure if this is exactly what you're looking for, but you can find longest common subsequences with Algorithm::Diff. With that, you could find the lengths of the differing sequences and divide that by the total length. (If you have trouble understanding the documentation for Algorithm::Diff, I wrote a module review which may help.

If the code snippet you have above is an accurate description of what you're trying to calculate, it may be faster (though more memory-intensive) to split up the strings into arrays and compare an element at a time, rather than calling substr over and over. Something like this: (note, this is untested)

my @ref_elems = split //, $ref_seq; my @test_elems = split //, $test_seq; my $score = 0; for (my $i = 0; $i < $len; $i++) { $score += $ref_elems[$i] eq $test_elems[$i]; }
Once you have the sequences in arrays, you can use all kinds of nifty techniques like mapcar, which can traverse both arrays in one neat statement. The top of that node has a very clear explanation of how to use it.

HTH

Replies are listed 'Best First'.
Re: Re: Similarity of strings
by professa (Beadle) on May 15, 2002 at 15:22 UTC
    I tested splitting the strings up into arrays and timed the two methods (simply via 'time script.pl').
    The split-method takes ~13 seconds to finish, the substr-method only ~7 seconds.
    The advantages of having the data ready in arrays doesn't count for me, I just need the percentage of similarity, and as fast as possible. ;-)
    I'll try out the rest of the suggested methods here and report which does best.

    Thanx, Micha

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://166733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-25 19:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found