Hello rnaeye,
To capture 10 or more consecutive G characters, you don’t need (G)\1{9,}, just use (G{10,}), which is simpler and easier to read.
Note than when you have a regex of the form / .* (G{10,}) /x, the first .* is greedy and will match as much of the G-sequence as it can, so the second capture will contain only the 10 Gs it needs to satisfy the match. If you want all the Gs (15 for the sample data given), you need to make the first match non-greedy: / .*? G{10,} /x.
Your requirements are not clear (to me). Please provide the exact output you desire for the given input data (and additional lines of input together with the desired output for each). In the meantime, I’m guessing you want to find a 10-character ACTG sequence immediately following the specific sequence ACTCCAGTCACGCCAATATCTCGTAT and followed (but not necessarily immediately) by a 10+ sequence of G characters:
use 5.18.2;
while (my $line = <DATA>)
{
say;
if ($line =~ m/ (ACTCCAGTCACGCCAATATCTCGTAT) ([ACTG]{10}) .*? (G{1
+0,}) /x)
{
say for $1, $2, $3; # Can use @{^CAPTURE} in Perl 5.25.7 an
+d later
}
}
__DATA__
GGCTTTCCGTTGTTGCTGGGTGTGGGGGGCGGGCGAGATTGGAAGAGCACACGTCTGAACTCCAGTCACG
+CCAATATCTCGTATGCCGTCTTCTGCTTGAAAAAAGGGGTGGGGGGGAGGGGGGGCGGGGGGGGGGGGG
+GGAGGGGGGGAG
Output:
13:18 >perl 1986_SoPW.pl
ACTCCAGTCACGCCAATATCTCGTAT
GCCGTCTTCT
GGGGGGGGGGGGGGG
13:18 >
Hope that helps,
|