regexp find last word

Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regexp find last word by borisz (Canon) on Mar 02, 2005 at 14:43 UTC
Reverse the string and replace the first: `$_ = "fox comes and fox goes into forest"; $_ = reverse; s/xof/xof eht/; $_ = reverse; print;` [download] Boris	[reply] [d/l]
Re^2: regexp find last word by Anonymous Monk on Mar 02, 2005 at 14:54 UTC
That just puts 'the' in front of the last fox, without considering the forest at all. It doesn't put 'the' in front of the last fox before a forest, except when that also happens to be the last fox on the line. That is, it will turn `fox comes and fox goes into the forest. No fox left` [download] into `fox comes and fox goes into the forest. No the fox left` [download]	[reply] [d/l] [select]
Re^3: regexp find last word by Fletch (Bishop) on Mar 02, 2005 at 15:04 UTC
What, you mean he didn't see the forrest for the thes?	[reply]
Re^3: regexp find last word by borisz (Canon) on Mar 02, 2005 at 15:03 UTC
Thats right, I was just to lazy to reverse forest.* as well. But the idea is to reverse the string and replace the first. Boris	[reply]
Re: regexp find last word by borisz (Canon) on Mar 02, 2005 at 14:49 UTC
Or eat more chars at the start. `$_ = "fox comes and fox goes into forest"; s/(.*)(fox.+?forest)/$1the $2/; print;` [download] Boris	[reply] [d/l]
Re^2: regexp find last word by holli (Abbot) on Mar 02, 2005 at 14:57 UTC
Basically the same principle as my answer, but slightly slower. use Benchmark; print "holli\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +fox.+)(fox.+?forest)/${1}the $2/; }); print "borisz\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +.)(fox.+?forest)/$1the $2/; }); print "sh1tn\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +?<=fox)(.+?)(fox)/$1the $2/; }); print "Roy Jonhson\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +?=fox(?:(?!.fox).)forest)/the /; }); [download] prints: `holli timethis 999999: 10 wallclock secs ( 9.87 usr + 0.00 sys = 9.87 CPU) + @ 101275.98/s (n=999999) borisz timethis 999999: 14 wallclock secs (11.08 usr + 0.00 sys = 11.08 CPU) + @ 90268.91/s (n=999999) sh1tn timethis 999999: 12 wallclock secs (10.41 usr + 0.01 sys = 10.42 CPU) + @ 95959.98/s (n=999999) Roy Jonhson timethis 999999: 10 wallclock secs (10.19 usr + 0.02 sys = 10.20 CPU) + @ 98000.69/s (n=999999)` [download] /me wins ;-) holli, /regexed monk/*	[reply] [d/l] [select]
Re^3: regexp find last word by Crackers2 (Parson) on Mar 02, 2005 at 15:45 UTC
That's all well for the speed, but let's check correctness: Testing string: fox comes and fox goes into forest `Holli : fox comes and the fox goes into forest Roy : fox comes and the fox goes into forest Borisz : fox comes and the fox goes into forest shltn : fox comes and the fox goes into forest` [download] Every solution got that one right. Testing string: fox comes fox walks and fox goes into forest `Holli : fox comes fox walks and the fox goes into forest Roy : fox comes fox walks and the fox goes into forest Borisz : fox comes fox walks and the fox goes into forest shltn : fox comes the fox walks and fox goes into forest` [download] Three foxes trip up shltn. Testing string: pig comes and fox goes into forest `Holli : pig comes and fox goes into forest Roy : pig comes and the fox goes into forest Borisz : pig comes and the fox goes into forest shltn : pig comes and fox goes into forest` [download] And just one fox trips up both Holli and shltn So I'm just going to pretend I have judging power and disqualify Holli's and shltn's entries, which makes Roy Jonhson the new winner. :)	[reply] [d/l] [select]
Re^4: regexp find last word by Anonymous Monk on Mar 03, 2005 at 10:46 UTC
Re^3: regexp find last word by Roy Johnson (Monsignor) on Mar 02, 2005 at 16:00 UTC
With corrected versions of yours and mine, plus ikegami's suggested alteration of mine: `use strict; use warnings; use Benchmark 'cmpthese'; cmpthese( -2, { holli => sub { $_ = "fox comes and fox goes into forest"; s/(fox.+)?(fox.+?forest)/${1}the $2/; }, Roy => sub { $_ = "fox comes and fox goes into forest"; s/(?=fox(?:(?!fox).)forest)/the /; }, ikegami => sub { $_ = "fox comes and fox goes into forest"; s/(fox(?:(?!fox).)forest)/the $1/; }, });` [download] `Rate ikegami holli Roy ikegami 34714/s -- -2% -27% holli 35541/s 2% -- -25% Roy 47659/s 37% 34% --` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re^3: regexp find last word by ikegami (Patriarch) on Mar 02, 2005 at 15:45 UTC
You can remove the (redundant) .* from Roy Johnson's. I don't know if it helps any, but you can remove the lookahead too: `s/(fox(?:(?!fox).)*forest)/the $1/;`	[reply] [d/l]
Re^4: regexp find last word by Roy Johnson (Monsignor) on Mar 02, 2005 at 15:50 UTC
Re^3: regexp find last word by valentin (Abbot) on Mar 02, 2005 at 21:12 UTC
Here is what I mensure on my PPC: use strict; use warnings; use Benchmark 'cmpthese'; cmpthese( -2, { holli => sub { $_ = "fox comes and fox goes into forest"; s/(fox.+)?(fox.+?forest)/${1}the $2/; }, Roy => sub { $_ = "fox comes and fox goes into forest"; s/(?=fox(?:(?!fox).)forest)/the /; }, ikegami => sub { $_ = "fox comes and fox goes into forest"; s/(fox(?:(?!fox).)forest)/the $1/; }, borisz => sub { $_ = reverse "fox comes and fox goes into forest"; s/(tserof.*?)xof/${1}xof eht/; $_ = reverse; } } ); __OUTPUT__ Rate holli ikegami Roy borisz holli 62934/s -- -3% -25% -31% ikegami 65121/s 3% -- -23% -29% Roy 84099/s 34% 29% -- -8% borisz 91530/s 45% 41% 9% -- [download]	[reply] [d/l]
Re: regexp find last word by holli (Abbot) on Mar 02, 2005 at 14:33 UTC
`s/(fox.+)(fox.+?forest)/${1}the $2/;` [download] Your regex matches at the earliest possible position. That weighs more than the laziness caused by the ?. Therefore you need to force it behind the first "fox". holli, /regexed monk/	[reply] [d/l]
Re^2: regexp find last word by Roy Johnson (Monsignor) on Mar 02, 2005 at 15:16 UTC
You should ?-quantify the first parenthesized expression, so it works for one fox as well: `s/(fox.+)?(fox.+?forest)/${1}the $2/` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^2: regexp find last word by manav (Scribe) on Mar 02, 2005 at 15:39 UTC
What happens in case of fox comes and the fox goes into forest The code given above will put a the in front of every last fox irrespective of if a the was already there or not. `while(<DATA>) { s/(?<!the )fox(?!.*fox)/the fox/ ; print; } __DATA__ the fox comes and fox goes into forest fox comes and fox goes into forest the fox comes and the fox goes into forest fox comes and the fox goes into forest` [download] Manav	[reply] [d/l]
Re^2: regexp find last word by Anonymous Monk on Mar 02, 2005 at 14:55 UTC
That works, but it will not work if you have two foxes and two forests and you want to slap a `/g` on it.	[reply]
Re: regexp find last word by Roy Johnson (Monsignor) on Mar 02, 2005 at 15:00 UTC
Using lookahead, you avoid replacing things with themselves: `s/(?=fox(?!.fox.forest).forest)/the /;` [download] If the next thing you see is fox, and it is not followed by some string containing fox followed later by forest, and it is followed by forest, then stick in "the ". Alternatively: `s/(?=fox(?:(?!fox).)forest)/the /;` [download] If the next thing you see is fox, and it is followed by a sequence of characters, none of which starts another fox, and then you see forest, stick in "the ". Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re^2: regexp find last word by ikegami (Patriarch) on Mar 02, 2005 at 15:43 UTC
or `s/(fox(?:(?!fox).)forest)/the $1/;` Update*: nm, capture makes it slower.	[reply] [d/l]
Re: regexp find last word by sh1tn (Priest) on Mar 02, 2005 at 14:50 UTC
Possible positive lookbehind: `$_ = "fox comes and fox goes into forest"; #s/(.+)(fox)/$1 the $2/; #s/(?<=and)(?:\s+fox)/ the/; s/(?<=fox)(.+?)(fox)/$1 the $2/;` [download]	[reply] [d/l]
Re: regexp find last word by Anonymous Monk on Mar 02, 2005 at 14:57 UTC
`s/(fox[^f](?:f(?!ox)[^f])*forest)/the $1/`	[reply] [d/l]
Re: regexp find last word by artist (Parson) on Mar 02, 2005 at 15:05 UTC
A generic version. `$_ = "fox comes and fox goes into forest"; s!(\b(\w+)\b.*?)\2!$1the $2!; print $_;` [download]	[reply] [d/l]
Re^2: regexp find last word by Anonymous Monk on Mar 02, 2005 at 15:37 UTC
That puts 'the' in front the first word that appears twice. Which doesn't need to be 'fox', nor does it has anything to do with the last fox before the forest. It would fail on: `"fox goes into forest"` [download]	[reply] [d/l]
Re: regexp find last word by gube (Parson) on Mar 02, 2005 at 15:01 UTC
Hi Try this, `$a = "fox comes and fox goes into forest"; $a =~ s#(fox.?)(fox.?forest)#$1the $2#gsi; print $a;` [download]	[reply] [d/l]
Re^2: regexp find last word by Anonymous Monk on Mar 02, 2005 at 15:34 UTC
Considering the regex doesn't contain 'forest', how on earth would it find the last fox before the forest? Your regex just puts 'the' in front of the second fox in the sentence. Which may work in the example, but doesn't solve the problem as stated.	[reply]
Re: regexp find last word by chas (Priest) on Mar 02, 2005 at 15:15 UTC
`$_ = "fox comes and fox goes into forest"; s/(.*)(fox)/$1the $2/; print;` [download] chas (Update: I didn't see all the other responses till I had posted or I wouldn't have posted.)	[reply] [d/l]
Re^2: regexp find last word by Anonymous Monk on Mar 02, 2005 at 15:39 UTC
That puts 'the' in front of the last fox in the line, which doesn't have to be the last fox before the forest. Amazing. Such a relatively simple question, and an incredibly number of wrong answers posted - even after valid answers have been posted as well. Makes you wonder how useful Perlmonks is for people trying to learn Perl. (Not very)	[reply]
Re^3: regexp find last word by chas (Priest) on Mar 02, 2005 at 17:40 UTC
Yes, you're correct, and that did occur to me when I wrote the code. However, in the present case the last "fox" is the same as the last "fox" before "forest". Of course, if you put more "fox"s after "forest", then that changes things, but also suppose there are several "forests" and "fox"s. Then do we want every "fox" that is the last before some "forest" or just the last such case, etc, etc. Without really exact requirements it's unclear. I guessed that the real question posed by the original poster had to do with finding the last "fox", and that the fact that he described it as the last "fox" "relative to forest" was coincidental. But looking at the original post again, I think your objection is justified. (Update: As far as Perl Monks being useful for learning, I find it extremely so. Sometimes the most interesting posts/replies involve some incorrect tries and then corrections. It is often quite useful to see some errors and realize just what went wrong in addition to some really slick solution.) chas	[reply]
Re^4: regexp find last word by Anonymous Monk on Mar 03, 2005 at 10:30 UTC


Don't ask to ask, just ask
	PerlMonks