Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: fuzzy match: trim sequences outside of the forward and reverse primer set.

by grizzley (Chaplain)
on Nov 08, 2012 at 08:29 UTC ( [id://1002833]=note: print w/replies, xml ) Need Help??


in reply to fuzzy match: trim sequences outside of the forward and reverse primer set.

I couldn't find 'fuzzy' word in docs of Bio::Perl. Can you explain what does fuzzy match in Bio::perl (which operation is it) do and what are the criteria of trimming seq after the match?
  • Comment on Re: fuzzy match: trim sequences outside of the forward and reverse primer set.

Replies are listed 'Best First'.
Re^2: fuzzy match: trim sequences outside of the forward and reverse primer set.
by lrl1997 (Novice) on Nov 08, 2012 at 15:24 UTC

    Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:

    >seq1

    aaagctcccc

    >seq2

    aaacctgggg

    if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.

    For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:

    >seq1

    agctcccc

    >seq2

    acctgggg

    but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.

      In that case there are at least two possibilities:
        1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
        2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//

        I don't think it returns the fuzzy matched string, but the sequence containing the string. therefore, I have no way to know what was the string found. Any more suggestions?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1002833]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-25 07:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found