Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

similar string matching

by Murcia (Monk)
on Jul 05, 2004 at 09:11 UTC ( [id://371826]=perlquestion: print w/replies, xml ) Need Help??

Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I have at my work a little tricky task. It is a string matching problem.

I want to compare two amino acid sequences (here in Letter code, single char represents one amino acid) to find at which positions one sequence lies in the other! example?

MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA # 1. sequence TTTTTTTTFTTTTTTTTTTTT # 2. sequence
result is: 2. lies at position 14 to 34 in 1.

simple? (for this I need no help!)
new examples

SUBSTITUTION AAAAEAAAARGAAATTTTFTTTTTTTTTTTTTTTTAAAAAAAAILVAAAAAAAA # 1. sequence TTTTFTTTATTTTTTDTTTTT # 2. sequence DELETION AAAAAAAAAAAAATTGTTTTTTTXXXXXTTTTTTTTTTMAAAAAAAAAAAAAAAA # 1. sequence TTGTTTTTTTTTTTTTTTTTM # 2. sequence REVERSE TTTTTTTTTTTTTTTTTTTT # 1. sequence AAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTAAAAAAAAAAAAAAAA # 2. sequence PERFECT MATCHING ONLY AT BEGIN AND END OF 2. SEQUENCE AAAAAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGTTTTTTTTTAAAAAAA # 1.sequence TTTTTTTTGGGNNGGGEEGGGEGGGGGGTTTTTTTTT # 2. Sequence

I tried with the regexp and the module String::Approx and aslice with the option 'minimal_distance', but I don't like the return values for this module.

Any hints how to do "the best way"?

Murcia

Edited by Chady -- fixed formatting.

Replies are listed 'Best First'.
Re: similar string matching
by Crian (Curate) on Jul 05, 2004 at 10:02 UTC
    I think I need a few more informations.

    For SUBSTITUTION: Do I get the part to substitute as a parameter or do I have to guess it (starts with TT and ands with TT or something like this)?

    Same question for DELETION: Do I have to guess whats going to be deleted?

    How long or better how small may the matching parts at the beginning and the ending be for a successfull return?

    Please describe your needs a little more exactly.
      The successfull return is a good question! It is quite difficult to infine. I want the best values on precision and recall. DELETIONS: I think that minimum 5 amino acid at both end are ok of a sucessfull return. Murcia
Re: similar string matching
by Anonymous Monk on Jul 05, 2004 at 09:23 UTC

    Have you tried with index?

    $s="MAAGAAAAFAAAATTTTTTTTFTTTTTTTTTTTTAAAAEAAAARAAAAAA"; $f="TTTTTTTTFTTTTTTTTTTTT"; print index($s,$f); # gives 13

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://371826]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-24 22:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found