bradcathey has asked for the wisdom of the Perl Monks concerning the following question:
Every monk loves a good regex question, right, especially the golfers. Here goes:
my $var = "thought test tot 1 2 3 tesset";
I want to end up with the result:
test 1 2 3
I tried:
$var =~ s/((t.*?t)([\w ]*test[\w ]*)(t.*?t))/$3/g;
but ended up with (not surprisingly):
test tot 1 2 3
Is there the possibility of (in plain English):
$var =~ s/(t.*?t) but not (test)//g;
Would love to keep it down to one line, which started this whole thing. Thanks, monks.
Update:
I "joined" the monastery to learn to be a better Perl programmer (also, to write better defined questions :-) .This node has been a great help—regexes are almost a language of their own. pg used conditionals, danger and Cody_Pendant showed practical examples of word boundries. Sometimes you just need to be nudged over the learning hump. Now, I have some code to study. Thanks.
—Brad "A little yeast leavens the whole dough."
Re: A regex that does this, but not that?
by sauoq (Abbot) on Nov 14, 2003 at 23:48 UTC
|
I'm not sure I entirely understand your requirements, but /t(?!est\b)\w*t/ would match any 't' not followed by 'est' and a word break then match 0 or more word characters and then match one last 't'. For example (anchors added):
#!/usr/bin/perl -w
use strict;
/^t(?!est\b)\w*t$/ and print while <DATA>;
__DATA__
test
testset
tot
tesset
tt
prints everything but 'test'.
Adapting it slightly for your original problem like this
#!/usr/bin/perl -lw
use strict;
$_ = "thought test tot 1 2 3 tesset";
s/t(?!est\b)\w*t\s*//g;
print;
prints "test 1 2 3" just as you want.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
Re: A regex that does this, but not that?
by pg (Canon) on Nov 14, 2003 at 23:49 UTC
|
my $var = "thought test tot 1 2 3 tesset";
$var =~ s/(t.*?t)/($1 ne "test") ? "" : $1/ge;
print $var;
| [reply] [d/l] |
|
| [reply] |
Re: A regex that does this, but not that?
by Abigail-II (Bishop) on Nov 15, 2003 at 00:11 UTC
|
It's not clear what you want. Do you want to remove all
words that aren't "test" or numbers? Do you want to remove
the words "thought", "tot" and "tesset"? Do you want to
remove all words, except the 2nd, 4th, 5th and 6th? Do you
want to remove all words that start and end with a "t",
but don't have "es" (and nothing else) between them?
Being able to properly formulate what you want a regex to do
solves the problem for 90%. Stating your problem by simple
example just leaves people guessing.
Abigail | [reply] |
|
| [reply] |
|
my $var = "though test tot 1 2 3 tesset";
$var =~ s/(t.*?t)/($1 ne "test") ? "" : $1/ge;
print $var; # prints: esoesset
But, now you mention a further constraint that the words to be deleted
may not contain any 't's inside, which is not inferrable from your earlier
posts at all. Providing a good specification is much more than providing a
sample case (but providing test cases *is* important).
Anyway, here's a go at your new specs:
my $var = <<TT;
target blah foo test thought 123 though tempest
testament though tightest treatment thermostat tantamount taboo
TT
$var =~ s/(?!\btest\b)(\bt[^t\W]*t\b)//g;
print $var;
__END__
## Result:
blah foo test 123 though
testament though tightest treatment thermostat tantamount taboo
So, all the 't.*t' words on the second line remain because they contain
a 't' character within. All the 't.*t' words on the first line get
deleted except for 'test'.
| [reply] [d/l] [select] |
|
any words that start with "t", end with "t", but do not contain any other "t"s within
OK so that's \bt[^t]+t\b -- word-boundary, then a t, then one or more other characters not a t, then a t, then a word boundary.
Apart from the abbreviation "tt" this should be fine.
So "tent", "tesseract", "tot", "tort" and "test" itself will match this pattern.
However, "testament" will fail it because of the "t" in the middle.
Then you need a special case for "test" itself, which you can do with the /e modifier and the ternary operator, as in pg's example above.
So something like this:
#!/usr/bin/perl -w
use strict;
my $words='test Buffy testament Anya tot Willow tesseract
Faith tent';
$words =~ s/\b(t[^t]+t)\b/$1 eq "test" ? $1 : ''/ge;
print $words;
# prints 'test Buffy testament Anya Willow Faith';
Where the regex means "Find words matching t, something-not-t, then t at the end. Replace them with nothing, unless they're the word test, in which case, replace them with themselves".
You could replace the ternary thing with this more longwinded version if you liked:
$words =~ s/\b(t[^t]+t)\b/
my $temp = $1;
if($temp eq 'test'){
$temp
}else{
''
}/xge;
($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print
| [reply] [d/l] [select] |
|
|
|
$_ = "this is the wristwatch";
s/(t.*?t)/($1 ne "test") ? "" : $1/ge;
print;
__END__
he wrisch
Now, that might be exactly what you had in mind, but it
doesn't suit the requirements.
Abigail | [reply] [d/l] [select] |
|
s/\s*\bt(?!est)[^t\W]*t\b//g;
Makeshifts last the longest.
| [reply] [d/l] |
Re: A regex that does this, but not that?
by BrowserUk (Patriarch) on Nov 14, 2003 at 23:49 UTC
|
$var =~ s[\bt(?!est).*?t\b\s*][]g;
Update: sauoq's right. I omitted a \b.
perl> $var = "thought testament test tot 1 2 3 tesset";
perl> $var =~ s[\bt(?!est\b).*?t\b\s*][]g; print $var;
test 1 2 3
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
| [reply] |
|
|