A regex that does this, but not that?

bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: A regex that does this, but not that? by sauoq (Abbot) on Nov 14, 2003 at 23:48 UTC
I'm not sure I entirely understand your requirements, but `/t(?!est\b)\wt/` would match any 't' not followed by 'est' and a word break then match 0 or more word characters and then match one last 't'. For example (anchors added): `#!/usr/bin/perl -w use strict; /^t(?!est\b)\wt$/ and print while <DATA>; __DATA__ test testset tot tesset tt` [download] prints everything but 'test'. Adapting it slightly for your original problem like this `#!/usr/bin/perl -lw use strict; $_ = "thought test tot 1 2 3 tesset"; s/t(?!est\b)\wt\s//g; print;` [download] prints "`test 1 2 3`" just as you want. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]
Re: A regex that does this, but not that? by pg (Canon) on Nov 14, 2003 at 23:49 UTC
`my $var = "thought test tot 1 2 3 tesset"; $var =~ s/(t.*?t)/($1 ne "test") ? "" : $1/ge; print $var;` [download]	[reply] [d/l]
Re: Re: A regex that does this, but not that? by bradcathey (Prior) on Nov 15, 2003 at 00:10 UTC
Thanks pg, that did exactly what I wanted. I have used conditionals in regexes before, but couldn't see the application here. Thanks to all the monks who replied. —Brad "A little yeast leavens the whole dough."	[reply]
Re: A regex that does this, but not that? by Abigail-II (Bishop) on Nov 15, 2003 at 00:11 UTC
It's not clear what you want. Do you want to remove all words that aren't "test" or numbers? Do you want to remove the words "thought", "tot" and "tesset"? Do you want to remove all words, except the 2nd, 4th, 5th and 6th? Do you want to remove all words that start and end with a "t", but don't have "es" (and nothing else) between them? Being able to properly formulate what you want a regex to do solves the problem for 90%. Stating your problem by simple example just leaves people guessing. Abigail	[reply]
Re: Re: A regex that does this, but not that? by bradcathey (Prior) on Nov 15, 2003 at 01:36 UTC
Abigail-II, I spent quite a bit of time trying to craft my example carefully, so that if there was a regex solution to return the result I specified, I'd have my answer. pg got it perfectly. But just in case you're still interested—I know you're one of the regex gurus around the monastery, and I have always appreciated your thoroughness: 1. I want to delete any words that start with "t", end with "t", but do not contain any other "t"s within, except for the word "test". 2. The result should only be the words: "test" and any other non tt words. "1 2 3" was just an example. 3. The order of words, the number of words, or the content of any other words not "t\w+t", should not be a factor. I'd still love to hear your thoughts as I am trying to really ramp up my coding skills. Thanks. —Brad "A little yeast leavens the whole dough."*	[reply]
Re: Re: Re: A regex that does this, but not that? by danger (Priest) on Nov 15, 2003 at 04:57 UTC
Well, pg's solution works for the limited input provided and you haven't given any further particulars regarding input. That solution breaks just changing the first word from "thought" to "though" : `my $var = "though test tot 1 2 3 tesset"; $var =~ s/(t.?t)/($1 ne "test") ? "" : $1/ge; print $var; # prints: esoesset` [download] But, now you mention a further constraint that the words to be deleted may not contain any 't's inside, which is not inferrable from your earlier posts at all. Providing a good specification is much more than providing a sample case (but providing test cases is* important). Anyway, here's a go at your new specs: `my $var = <<TT; target blah foo test thought 123 though tempest testament though tightest treatment thermostat tantamount taboo TT $var =~ s/(?!\btest\b)(\bt[^t\W]t\b)//g; print $var; __END__ ## Result: blah foo test 123 though testament though tightest treatment thermostat tantamount taboo` [download] So, all the 't.t' words on the second line remain because they contain a 't' character within. All the 't.*t' words on the first line get deleted except for 'test'.	[reply] [d/l] [select]
Re: Re: Re: A regex that does this, but not that? by Cody Pendant (Prior) on Nov 15, 2003 at 05:17 UTC
any words that start with "t", end with "t", but do not contain any other "t"s within OK so that's `\bt[^t]+t\b` -- word-boundary, then a t, then one or more other characters not a t, then a t, then a word boundary. Apart from the abbreviation "tt" this should be fine. So "tent", "tesseract", "tot", "tort" and "test" itself will match this pattern. However, "testament" will fail it because of the "t" in the middle. Then you need a special case for "test" itself, which you can do with the /e modifier and the ternary operator, as in pg's example above. So something like this: `#!/usr/bin/perl -w use strict; my $words='test Buffy testament Anya tot Willow tesseract Faith tent'; $words =~ s/\b(t[^t]+t)\b/$1 eq "test" ? $1 : ''/ge; print $words; # prints 'test Buffy testament Anya Willow Faith';` [download] Where the regex means "Find words matching t, something-not-t, then t at the end. Replace them with nothing, unless they're the word test, in which case, replace them with themselves". You could replace the ternary thing with this more longwinded version if you liked: `$words =~ s/\b(t[^t]+t)\b/ my $temp = $1; if($temp eq 'test'){ $temp }else{ '' }/xge;` [download] `($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print` [download]	[reply] [d/l] [select]
Re: Re: Re: Re: A regex that does this, but not that? by Anonymous Monk on Nov 15, 2003 at 06:38 UTC
Re: Re: Re: Re: Re: A regex that does this, but not that? by Cody Pendant (Prior) on Nov 15, 2003 at 07:32 UTC
Re: A regex that does this, but not that? by Abigail-II (Bishop) on Nov 16, 2003 at 01:51 UTC
I spent quite a bit of time trying to craft my example carefully, so that if there was a regex solution to return the result I specified, I'd have my answer. But the problem is that you left it at the example. I could have given you a couple of regexes that solved your example, but would probably have failed to do what you wanted on the second example you tried. pg got it perfectly. Then you and he got lucky. If he came up with a different regexp that solved your one example, but that would do something else on other sentences, he would have wasted time formulating a useless answer. However, is it really true that pg's answer got it right? Your requirements say: I want to delete any words that start with "t", end with "t", but do not contain any other "t"s within, except for the word "test". and pg's regex is: `s/(t.?t)/($1 ne "test") ? "" : $1/ge;` [download] Now, to me that regex just deletes strings starting with a t, and ending with the next t, with the exception of the word "test". So, let's try it on another example: `$_ = "this is the wristwatch"; s/(t.?t)/($1 ne "test") ? "" : $1/ge; print; __END__ he wrisch` [download] Now, that might be exactly what you had in mind, but it doesn't suit the requirements. Abigail	[reply] [d/l] [select]
Re^3: A regex that does this, but not that? by Aristotle (Chancellor) on Nov 22, 2003 at 09:13 UTC
`s/\s\bt(?!est)[^t\W]t\b//g;` [download] Makeshifts last the longest.	[reply] [d/l]
Re: A regex that does this, but not that? by BrowserUk (Patriarch) on Nov 14, 2003 at 23:49 UTC
`$var =~ s[\bt(?!est).?t\b\s][]g;` [download] Update: sauoq's right. I omitted a \b. `perl> $var = "thought testament test tot 1 2 3 tesset"; perl> $var =~ s[\bt(?!est\b).?t\b\s][]g; print $var; test 1 2 3` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray! Wanted!	[reply] [d/l] [select]
Re: Re: A regex that does this, but not that? by sauoq (Abbot) on Nov 15, 2003 at 00:00 UTC
But, if `$var` contained "testament", for example, that would fail. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l]
Re: Re: Re: A regex that does this, but not that? by BrowserUk (Patriarch) on Nov 15, 2003 at 01:11 UTC
Yep!. Needs an extra \b. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray! Wanted!	[reply]


Don't ask to ask, just ask
	PerlMonks