Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Lookbehind and backreferences

by seaver (Pilgrim)
on Dec 02, 2004 at 15:59 UTC ( [id://411820]=note: print w/replies, xml ) Need Help??


in reply to Lookbehind and backreferences
in thread Perl regular expression for amino acid sequence

Roy

Thanks for your input, (and everybody else too) I see that you've given two slightly different solutions, am I assuming this one is THE solution?

Since my understanding of perl regex was limited to my initial pattern, I'm not sure I understand some of the conversation that has been going on. However I realised that the length of the pattern found is a big topic, and I hand't thought about that.

Truly the longer the pattern, the more significance. However, I am looking for repeats of patterns within a sequence, and biologically, repeats dont have to be identical, so YYGNG to me, is a repeat of YYGNN. But because variations could include other residues (it's almost the entire alphabet) it's also important that I get both short and long matches.

I guess what I'm trying to say, is that does your solution try to make the match as long as possible?

Thanks
Sam

ps: if anyone liked this challenge of regex, here's another challenge:

I'd wanna find /[QYGN]{4,6}/ under the same conditions, however, the solution can have one residue of ANY letter.

Replies are listed 'Best First'.
Re^2: Lookbehind and backreferences
by Roy Johnson (Monsignor) on Dec 02, 2004 at 16:23 UTC
    This solution is functionally identical to the one I called a pure regex solution that works, so it's purely a matter of taste which you consider "THE solution".

    In both cases, they do try to make the match as long (up to six chars) as possible, though given GYNNNGYYY, you would get GYNN and NGYY rather than GYN and NNGYY. Earlier matches take all they can.

    Matching residues makes it very tricky. I will have to ponder that. Meanwhile, you might find it useful to find all your non-residue matches, and then use String::Approx to find copies of those with residues.


    Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://411820]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-19 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found