Re: Regexp: How to match in middle but not the ends?
by ikegami (Patriarch) on Jul 28, 2006 at 20:46 UTC
|
Don't use $&! It slows down all the regexp in your program (including modules) that don't have captures. Use captures instead.
while ($string =~ /([CSH][CSHL]*[CSH])/g) {
print "$1, ";
}
Use join to avoid the trailing comma.
print join ', ', $string =~ /([CSH][CSHL]*[CSH])/g;
An alternative approach would be to strip out the offending L characters.
for ($string) {
s/-L+|L+-/-/g;
s/^L+//;
s/L+$//;
print join ', ', /([CSHL]{2,})/g;
}
Update: Added s/^L// and s/L$//.
Update: Changed L to L+.
Update: Accidently changed too many things to "+"s. Fixed.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
No, he shouldn't use "+", since he doesn't want to strip any sequence of "L". For example, your solution fails to match "LL" in "---LLLL---".
My solution was lacking since I wasn't checking for an L at the start or end of the string. Fixes:
$string =~ s/-L|L-/-/g;
$string =~ s/^L//;
$string =~ s/L$//;
or
$string =~ s/^L|(?<-)L|L(?=-)|L$//g;
or
# Does a bit more than stripping, but in an inconsequential fashion.
$string =~ s/^L|-L|L-|L$/-/g;
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
print join ', ', $string =~ /([CSH][CSHL]+[CSH])/g;
matches at least 3 characters and thus should be print join ', ', $string =~ /([CSH][CSHL]*[CSH])/g; (as you mention somewhere else in this thread).
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp: How to match in middle but not the ends?
by Hue-Bond (Priest) on Jul 28, 2006 at 20:36 UTC
|
Not being a regex expert, I've come up with this:
my $string = '---LL--C----LCSH-------CSHL-------LCSLH-------LCCHLSHCL-
+---';
while ($string =~ /([CHS][LCHS]+[CHS])/xg) {
print "$1\n";
}
__END__
CSH
CSH
CSLH
CCHLSHC
| [reply] [Watch: Dir/Any] [d/l] |
Re: Regexp: How to match in middle but not the ends?
by Fletch (Bishop) on Jul 28, 2006 at 20:32 UTC
|
Perhaps you should read perlre and see if a non-greedy modifier /[CSH][CSHL]+?/ helps? Or explicitly require that the last character isn't an L, /[CSH][CSHL]*[CSH]/g.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
/[CSH][CSHL]+?/ doesn't work. It would incorrectly match "CL" in "---CL---", and it will never match more than two characters.
/[CSH][CSHL]*[CSH]/ works.
If there wasn't a two character minimum, we'd have to use zero-width lookaheads and/or lookbehinds.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp: How to match in middle but not the ends?
by explorer (Chaplain) on Jul 28, 2006 at 20:36 UTC
|
my $string = '---LL--C----LCSH-------CSHL-------LCSLH-------LCCHLSHCL-
+---';
# I want to loop through these strings and
# find maximal sets of two or more of the capital letters
# that do not begin or end with L
while ($string =~ /([CSH][CSHL]*[CSH])/g) {
print "$1, ";
}
__OUTPUT__
CSH, CSH, CSLH, CCHLSHC,
| [reply] [Watch: Dir/Any] [d/l] |
Re: Regexp: How to match in middle but not the ends?
by Skeeve (Parson) on Jul 28, 2006 at 23:45 UTC
|
TIMTOWDI, and assuming, your string need not be checked against containing only legal characters:
my $string = '---LL--C----LCSH-------CSHL-------LCSLH-------LCCHLSHCL-
+---';
foreach (split /L*-+L*/, $string) {
print $_,"\n" if length($_)>2;
}
Update: I just noticed: This will fail if the string starts or ends with "L" and not with "-" or anything else. So making the $string in the split a "-$string-" is one workaround.
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Regexp: How to match in middle but not the ends?
by furry_marmot (Pilgrim) on Jul 31, 2006 at 10:34 UTC
|
Taking the poster's exact description, "I want to loop through these strings and find maximal sets of two or more of the capital letters that do not begin or end with L." I came up with this:
my $string = '---LL--C----LCSH-------CSHL-------LCSLH-------LCCHLSHCL-
+---';
@patterns = $string =~ /([^-L][LCSH]*[^-L])/g;
print join ', ', @patterns;
The result is this:
CSH, CSH, CSLH, CCHLSHC
Is that what you were looking for?
--marmot | [reply] [Watch: Dir/Any] [d/l] |
Re: Regexp: How to match in middle but not the ends?
by TedPride (Priest) on Jul 30, 2006 at 17:56 UTC
|
Perhaps the simplest solution is a two-part regex sequence:
$_ = '---LL--C----LCSH-------CSHL-------LCSLH-------LCCHLSHCL----';
while (m/([A-Z]+)/g) {
($s = $1) =~ s/^L+|L+$//g;
print "$s\n" if $s;
}
EDIT: Oops, missed that. | [reply] [Watch: Dir/Any] [d/l] |
|
$_ = '---LL--C----LCSH-------CSHL-------LCSLH-------LCCHLSHCL----';
while (m/([A-Z]+)/g) {
($s = $1) =~ s/^L+|L+$//g;
print "$s\n" if length $s >= 2;
}
| [reply] [Watch: Dir/Any] [d/l] |