http://qs321.pair.com?node_id=411820


in reply to Lookbehind and backreferences
in thread Perl regular expression for amino acid sequence

Roy

Thanks for your input, (and everybody else too) I see that you've given two slightly different solutions, am I assuming this one is THE solution?

Since my understanding of perl regex was limited to my initial pattern, I'm not sure I understand some of the conversation that has been going on. However I realised that the length of the pattern found is a big topic, and I hand't thought about that.

Truly the longer the pattern, the more significance. However, I am looking for repeats of patterns within a sequence, and biologically, repeats dont have to be identical, so YYGNG to me, is a repeat of YYGNN. But because variations could include other residues (it's almost the entire alphabet) it's also important that I get both short and long matches.

I guess what I'm trying to say, is that does your solution try to make the match as long as possible?

Thanks
Sam

ps: if anyone liked this challenge of regex, here's another challenge:

I'd wanna find /[QYGN]{4,6}/ under the same conditions, however, the solution can have one residue of ANY letter.