Yes, the direct matches are not a problem. I am glad you could confirm the code. For the mismatch code I mean:
I have a hash value which is a DNA sequence, example:
TGATTGAA
If my % threshold was .75 for example, and if I was searching for TGAT, I would like for the program to tell me that there able for the program to find the identical match at position 1, AND identify the second match at position 5 as an "acceptable mismatch".
With respect to your suggestion of regexp...Is it possible to type those in on the command line? Currently I use @ARGV[0] as my $search query, typically something like "ATC".
Thanks!
ER | [reply] [Watch: Dir/Any] |
Your second question is much easier: Yes. Regular expressions can be built from any string, including those supplied by users. Generally, you should use quotemeta or the \Q and \E markers to make sure the string is free from regular expression meta characters like *, ., and more evil eval-type expressions. In your case, you could also check that the string is a valid nt sequence:
my $string = quotemeta shift;
die "Not a valid nucleotide sequence" if $string =~ /[^AGTC]/;
As for the first question, one way would be to build a regex for each possibility. An example:
my $string = "TGAT";
my @nts = map {
my $tmp = $string;
substr $tmp, $_, 1, '.';
$tmp;
} (0 .. length ($string) -1);
my $groupings = join '|', @nts;
my $sample = "TGATTGGAATGTTAGAT";
while ( $sample =~ /($groupings)/go )
{
print "Matched $1 ending at position ", pos $sample, "\n";
}
@_=qw;
Just another Perl hacker,;
;$_=q=print
"@_"= and eval;
| [reply] [Watch: Dir/Any] [d/l] [select] |