Re: regexp find last word
by borisz (Canon) on Mar 02, 2005 at 14:43 UTC
|
Reverse the string and replace the first:
$_ = "fox comes and fox goes into forest";
$_ = reverse;
s/xof/xof eht/;
$_ = reverse;
print;
| [reply] [d/l] |
|
That just puts 'the' in front of the last fox, without considering the forest at all. It doesn't put 'the' in front of the last fox before a forest, except when that also happens to be the last fox on the line. That is, it will turn
fox comes and fox goes into the forest. No fox left
into
fox comes and fox goes into the forest. No the fox left
| [reply] [d/l] [select] |
|
| [reply] |
|
Thats right, I was just to lazy to reverse forest.* as well. But the idea is to reverse the string and replace the first.
| [reply] |
Re: regexp find last word
by borisz (Canon) on Mar 02, 2005 at 14:49 UTC
|
Or eat more chars at the start.
$_ = "fox comes and fox goes into forest";
s/(.*)(fox.+?forest)/$1the $2/;
print;
| [reply] [d/l] |
|
Basically the same principle as my answer, but slightly slower.
use Benchmark;
print "holli\n";
timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/(
+fox.+)(fox.+?forest)/${1}the $2/; });
print "borisz\n";
timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/(
+.*)(fox.+?forest)/$1the $2/; });
print "sh1tn\n";
timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/(
+?<=fox)(.+?)(fox)/$1the $2/; });
print "Roy Jonhson\n";
timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/(
+?=fox(?:(?!.*fox).)*forest)/the /; });
prints:
holli
timethis 999999: 10 wallclock secs ( 9.87 usr + 0.00 sys = 9.87 CPU)
+ @ 101275.98/s (n=999999)
borisz
timethis 999999: 14 wallclock secs (11.08 usr + 0.00 sys = 11.08 CPU)
+ @ 90268.91/s (n=999999)
sh1tn
timethis 999999: 12 wallclock secs (10.41 usr + 0.01 sys = 10.42 CPU)
+ @ 95959.98/s (n=999999)
Roy Jonhson
timethis 999999: 10 wallclock secs (10.19 usr + 0.02 sys = 10.20 CPU)
+ @ 98000.69/s (n=999999)
/me wins ;-)
| [reply] [d/l] [select] |
|
That's all well for the speed, but let's check correctness:
Testing string: fox comes and fox goes into forest
Holli : fox comes and the fox goes into forest
Roy : fox comes and the fox goes into forest
Borisz : fox comes and the fox goes into forest
shltn : fox comes and the fox goes into forest
Every solution got that one right.
Testing string: fox comes fox walks and fox goes into forest
Holli : fox comes fox walks and the fox goes into forest
Roy : fox comes fox walks and the fox goes into forest
Borisz : fox comes fox walks and the fox goes into forest
shltn : fox comes the fox walks and fox goes into forest
Three foxes trip up shltn.
Testing string: pig comes and fox goes into forest
Holli : pig comes and fox goes into forest
Roy : pig comes and the fox goes into forest
Borisz : pig comes and the fox goes into forest
shltn : pig comes and fox goes into forest
And just one fox trips up both Holli and shltn
So I'm just going to pretend I have judging power and disqualify Holli's and shltn's entries, which makes Roy Jonhson the new winner. :)
| [reply] [d/l] [select] |
|
|
With corrected versions of yours and mine, plus ikegami's suggested alteration of mine:
use strict;
use warnings;
use Benchmark 'cmpthese';
cmpthese( -2, {
holli => sub {
$_ = "fox comes and fox goes into forest";
s/(fox.+)?(fox.+?forest)/${1}the $2/;
},
Roy => sub {
$_ = "fox comes and fox goes into forest";
s/(?=fox(?:(?!fox).)*forest)/the /;
},
ikegami => sub {
$_ = "fox comes and fox goes into forest";
s/(fox(?:(?!fox).)*forest)/the $1/;
},
});
Rate ikegami holli Roy
ikegami 34714/s -- -2% -27%
holli 35541/s 2% -- -25%
Roy 47659/s 37% 34% --
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
You can remove the (redundant) .* from Roy Johnson's.
I don't know if it helps any, but you can remove the lookahead too:
s/(fox(?:(?!fox).)*forest)/the $1/;
| [reply] [d/l] |
|
|
Here is what I mensure on my PPC:
use strict;
use warnings;
use Benchmark 'cmpthese';
cmpthese(
-2,
{
holli => sub {
$_ = "fox comes and fox goes into forest";
s/(fox.+)?(fox.+?forest)/${1}the $2/;
},
Roy => sub {
$_ = "fox comes and fox goes into forest";
s/(?=fox(?:(?!fox).)*forest)/the /;
},
ikegami => sub {
$_ = "fox comes and fox goes into forest";
s/(fox(?:(?!fox).)*forest)/the $1/;
},
borisz => sub {
$_ = reverse "fox comes and fox goes into forest";
s/(tserof.*?)xof/${1}xof eht/;
$_ = reverse;
}
}
);
__OUTPUT__
Rate holli ikegami Roy borisz
holli 62934/s -- -3% -25% -31%
ikegami 65121/s 3% -- -23% -29%
Roy 84099/s 34% 29% -- -8%
borisz 91530/s 45% 41% 9% --
| [reply] [d/l] |
Re: regexp find last word
by holli (Abbot) on Mar 02, 2005 at 14:33 UTC
|
s/(fox.+)(fox.+?forest)/${1}the $2/;
Your regex matches at the earliest possible position. That weighs more than the laziness caused by the ?. Therefore you need to force it behind the first "fox".
| [reply] [d/l] |
|
You should ?-quantify the first parenthesized expression, so it works for one fox as well:
s/(fox.+)?(fox.+?forest)/${1}the $2/
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
|
What happens in case of
fox comes and the fox goes into forest
The code given above will put a the in front of every last fox irrespective of if a the was already there or not.
while(<DATA>) {
s/(?<!the )fox(?!.*fox)/the fox/ ;
print;
}
__DATA__
the fox comes and fox goes into forest
fox comes and fox goes into forest
the fox comes and the fox goes into forest
fox comes and the fox goes into forest
Manav
| [reply] [d/l] |
|
That works, but it will not work if you have two foxes and two forests and you want to slap a /g on it.
| [reply] |
Re: regexp find last word
by Roy Johnson (Monsignor) on Mar 02, 2005 at 15:00 UTC
|
Using lookahead, you avoid replacing things with themselves:
s/(?=fox(?!.*fox.*forest).*forest)/the /;
If the next thing you see is fox, and it is not followed by some string containing fox followed later by forest, and it is followed by forest, then stick in "the ".
Alternatively:
s/(?=fox(?:(?!fox).)*forest)/the /;
If the next thing you see is fox, and it is followed by a sequence of characters, none of which starts another fox, and then you see forest, stick in "the ".
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
Re: regexp find last word
by sh1tn (Priest) on Mar 02, 2005 at 14:50 UTC
|
Possible positive lookbehind:
$_ = "fox comes and fox goes into forest";
#s/(.+)(fox)/$1 the $2/;
#s/(?<=and)(?:\s+fox)/ the/;
s/(?<=fox)(.+?)(fox)/$1 the $2/;
| [reply] [d/l] |
Re: regexp find last word
by Anonymous Monk on Mar 02, 2005 at 14:57 UTC
|
s/(fox[^f]*(?:f(?!ox)[^f]*)*forest)/the $1/ | [reply] [d/l] |
Re: regexp find last word
by artist (Parson) on Mar 02, 2005 at 15:05 UTC
|
$_ = "fox comes and fox goes into forest";
s!(\b(\w+)\b.*?)\2!$1the $2!;
print $_;
| [reply] [d/l] |
|
That puts 'the' in front the first word that appears twice. Which doesn't need to be 'fox', nor does it has anything to do with the last fox before the forest. It would fail on:
"fox goes into forest"
| [reply] [d/l] |
Re: regexp find last word
by gube (Parson) on Mar 02, 2005 at 15:01 UTC
|
$a = "fox comes and fox goes into forest";
$a =~ s#(fox.*?)(fox.*?forest)#$1the $2#gsi;
print $a;
| [reply] [d/l] |
|
Considering the regex doesn't contain 'forest', how on earth would it find the last fox before the forest? Your regex just puts 'the' in front of the second fox in the sentence. Which may work in the example, but doesn't solve the problem as stated.
| [reply] |
Re: regexp find last word
by chas (Priest) on Mar 02, 2005 at 15:15 UTC
|
$_ = "fox comes and fox goes into forest";
s/(.*)(fox)/$1the $2/;
print;
chas
(Update: I didn't see all the other responses till I had posted or I wouldn't have posted.) | [reply] [d/l] |
|
| [reply] |
|
Yes, you're correct, and that did occur to me when I wrote the code. However, in the present case the last "fox" is the same as the last "fox" before "forest". Of course, if you put more "fox"s after "forest", then that changes things, but also suppose there are several "forests" and "fox"s. Then do we want every "fox" that is the last before *some* "forest" or just the last such case, etc, etc. Without really exact requirements it's unclear. I guessed that the real question posed by the original poster had to do with finding the last "fox", and that the fact that he described it as the last "fox" "relative to forest" was coincidental. But looking at the original post again, I think your objection is justified.
(Update: As far as Perl Monks being useful for learning, I find it extremely so. Sometimes the most interesting posts/replies involve some incorrect tries and then corrections. It is often quite useful to see some errors and realize just what went wrong in addition to some really slick solution.)
chas
| [reply] |
|