Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Perl regular expression for amino acid sequence

by Roy Johnson (Monsignor)
on Dec 01, 2004 at 20:06 UTC ( [id://411547]=note: print w/replies, xml ) Need Help??


in reply to Perl regular expression for amino acid sequence

This will come close, but will fail if the match is followed by extra repetitions.
/(?:(?!(.)\1\1)[QGYN]){3,6}/;
You might have to consider each character separately, which leads to a long ugly string of alternations. The first char matches your character class. The second is either not a repeat, or is a repeat followed by not a repeat. The third is either not a repeat of the second, or a repeat followed by not a repeat.

After that, the pattern is repeated for the 4th and 5th characters, but they're all optional and nested (so if you don't have the 4th char, you don't look for the 5th). The 6th char doesn't need to check for repetitions, because it was checked by the pattern for the 5th char.

while ($seq{$k} =~ /(([QGYN]) ((?!\2)[QGYN]|\2(?!\2)) ((?!\3)[QGYN]|\3(?!\3)) (?:((?!\4)[QGYN]|\4(?!\4)) (?:((?!\5)[QGYN]|\5(?!\5)) [QGYN]?)?)?) /xg) { print "\n$k"; print $1." begins at position ", (pos($seq{$k})-length($s)) , "\n"; }
Update: adjusted to fit OP's code snippet.
Update2: As Ikegami noted (and I noted in responding to a different post), this solution has the problem of looking too far ahead. It won't take the first two characters out of a trio. A working regex-only solution is posted as a reply to this post.

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^2: Perl regular expression for amino acid sequence
by Roy Johnson (Monsignor) on Dec 01, 2004 at 21:45 UTC
    Here's a pure regex solution that works:
    use strict; use warnings; while(<DATA>) { print "$_---\n"; my $m; while (/([QGYN]{2} # First two characters of the desired class (?: # Followed by the complex expression... # Lookback at the previous two chars (?<=(.)(.)) # Check that the next char differs from at least one of th +em (?:(?!\2)|(?!\3)) [QGYN] # Then take another of the desired class ){1,4} # ...1 to 4 times )/gx) { $m = $1; printf "---> $m starting at %d\n", pos($_)-length($m); } print "=====\n"; } __DATA__ QYGNGNG GGGGGNYGNQYNNNQGYQ QGYNNN xxxxxxxGNNNxxxxxxxNNNGYGYxxxxxxxGYGYNNNxxxxxxxNNNGNNNxxxxxxx

    Caution: Contents may have been coded under pressure.

      This is a very nice solution, I haven't seen that trick before.

      Hugo

Re^2: Perl regular expression for amino acid sequence
by dragonchild (Archbishop) on Dec 01, 2004 at 20:47 UTC
    Build that programmatically.
    my $regex = '(([QGYN])'; $regex .= '((?!\\' . $_ . ')[QGYN]|\\' . $_. '(?!\\${_}))' for 2 .. 3; $regex .= '(?:((?!\\' . $_ . ')[QGYN]|\\' . $_ . '(?!\\' . $_ . '))' f +or 4 .. 5; $regex .= '[QGYN]?)?)?)'; $regex = qr/$regex/; while ($seq{$k} =~ /$regex/g) { print "\n$k"; print $1." begins at position ", (pos($seq{$k})-length($s)) , "\n" +; }

    Now, I have no idea what all that does, but it's easily broken apart. :-)

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re^2: Perl regular expression for amino acid sequence
by ikegami (Patriarch) on Dec 01, 2004 at 21:11 UTC
    Input 'xxxxxxxGNNNxxxxxxxNNNGYGYxxxxxxxGYGYNNNxxxxxxxNNNGNNNxxxxxxx' gives:
    NNGYGY begins at position 19 GYGYNN begins at position 32 NNGN begins at position 47

    rather than

    GNN begins at position 7 <--- NNGYGY begins at position 19 GYGYNN begins at position 32 NNGNN begins at position 47 <---
Re^2: Perl regular expression for amino acid sequence
by seaver (Pilgrim) on Dec 01, 2004 at 20:27 UTC
    Thank you!

    It sure did work, and it was surely surprising that it was so ugly. :-D

    Sam

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://411547]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-25 09:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found