Re: 'grouping' substrings?
by japhy (Canon) on Feb 01, 2006 at 15:59 UTC
|
I'd make use of Perl's @- and @+ arrays produces by regexes:
my $seq = "...";
my @groups;
push @groups, [$-[0], $+[0]] while $seq =~ /M+/g;
print "$_->[0] to $_->[1]\n" for @groups;
This gives me different values than you've shown, but I believe it's correct.
| [reply] [d/l] [select] |
|
Yes, don't mind what I wrote, it was just an example...
will try your code ASAP.
Will also check index function..
thanx to both of you!
| [reply] |
|
Sorry to bother you again, but it doesn't seem to work.
For example, the first group gives 5-16, while it should be 5-15, the second 32-45, while it should be 32-44 from what I can calculate...
Are my maths poor???
Also, I can't understand what [$-[0], $+[0]] mean... Any tips ?
Sorry, I'm just beggining Perl...
| [reply] |
|
|
|
Re: 'grouping' substrings?
by kwaping (Priest) on Feb 01, 2006 at 15:54 UTC
|
You might like Perl's index function. | [reply] |
|
sub using_index {
our $seq; *seq = \$_[0];
my @groups;
my $pos = -1;
my $start = -1;
for (;;) {
my $new_pos = index($seq, 'M', $pos+1);
if ($new_pos < 0) {
if (defined($start)) {
push(@groups, [ $start, $pos ]);
}
last;
}
if ($start < 0) {
$start = $new_pos;
}
elsif ($new_pos - $pos > 1) {
push(@groups, [ $start, $pos ]);
$start = $new_pos;
}
$pos = $new_pos;
}
return @groups;
}
It would be simpler if there was a function that returned the next character which isn't 'M'.
As you can guess, it's much slower than the regexp approach. The regexp approach is 170% faster than (i.e. 2.7 times the speed of) the index method on the input you provided.
Benchmark code:
Benchmark results:
| [reply] [d/l] [select] |
Re: 'grouping' substrings?
by murugu (Curate) on Feb 01, 2006 at 16:29 UTC
|
use strict;
use warnings;
my $seq="IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMMMMMOO
+OOO
+OOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMIIIIIIM
+MMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOO
+OOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMM
+MMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOO
+OMMMMMMMMI";
while ($seq=~/(M+)/g) {
my $l = pos($seq);
print $l-length($&)+1," to ",$l,$/;
}
Regards, Murugesan Kandasamy use perl for(;;);
| [reply] [d/l] |
|
I dont know whether using $& is effecient or not.
Using $& is not efficient, and usually to be avoided. See the entry in perlvar for details.
Update: perhaps I was a bit imprecise. I'd say something that "imposes a considerable performance penalty on all regular expression matches" is inefficient, but I guess it depends on what type of inefficiency we're talking about.
| [reply] |
|
That's not exactly true. $& is only inefficient if you have another regexp in your program which doesn't capture.
However, it's use is discouraged, since captures can perform the same task without the "effect at a distance" of $&.
In this case, just replace $& with $1, and you're set.
| [reply] [d/l] [select] |
Re: 'grouping' substrings?
by Cristoforo (Curate) on Feb 02, 2006 at 02:13 UTC
|
Having the luxury of time to consider it ;-) , here was my approach using index.
my @pos;
my $start = index($str, 'M');
while ($start != -1) {
my $pos;
my $i = 0;
1 while ($pos = index($str, 'M', $start + $i)) == $start + $i++;
push @pos, [$start, $start + $i-2];
$start = $pos;
}
Update - fix three lines
my $i = 1;
$i++ while ($pos = index($str, 'M', $start + $i)) == $start + $i;
push @pos, [$start, $start + $i-1];
| [reply] [d/l] [select] |
Re: 'grouping' substrings?
by ysth (Canon) on Feb 02, 2006 at 00:30 UTC
|
Others seem to have interpreted this as "find all groups of one or more M's".
On the off chance that you actually meant 6 or more M's, try this
modification of japhy's solution:
my $seq = "...";
my @groups;
push @groups, [$-[0], $+[0]-1] while $seq =~ /M{6,}/g;
print "$_->[0] to $_->[1]\n" for @groups;
(where the displayed positions are 0-based.) | [reply] [d/l] |
Re: 'grouping' substrings?
by Skeeve (Parson) on Feb 02, 2006 at 12:56 UTC
|
$seq="IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMMMMMOOOOO
+OOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMIIIIIIM
+MMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOO
+OOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMM
+MMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOO
+OMMMMMMMMI";
$i=0;
$_= $seq;
s/([^M]*)(M*)/{ my $j=$i+length($1); $i=$j+length($2); $j==$i ? "" :
+"pos $j-$i\n" }/ge;
print $seq, "\n", $_, "\n";
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |