Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

regexp find last word

by Murcia (Monk)
on Mar 02, 2005 at 14:27 UTC ( [id://435836]=perlquestion: print w/replies, xml ) Need Help??

Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Hi, monks I need a hint for a regexp
$_ = "fox comes and fox goes into forest" s/(fox.+?forest)/the $1/; print;

prints: the fox comes and fox goes into forest

but what I want is: fox comes and the fox goes into forest

put the word "the" before the nearest (or last fox) relative to forest

I know this should be simple, but I have a block somewhere, I can not find the solution .... Best regards Murcia

Replies are listed 'Best First'.
Re: regexp find last word
by borisz (Canon) on Mar 02, 2005 at 14:43 UTC
    Reverse the string and replace the first:
    $_ = "fox comes and fox goes into forest"; $_ = reverse; s/xof/xof eht/; $_ = reverse; print;
    Boris
      That just puts 'the' in front of the last fox, without considering the forest at all. It doesn't put 'the' in front of the last fox before a forest, except when that also happens to be the last fox on the line. That is, it will turn
      fox comes and fox goes into the forest. No fox left
      into
      fox comes and fox goes into the forest. No the fox left

        What, you mean he didn't see the forrest for the thes?

        Thats right, I was just to lazy to reverse forest.* as well. But the idea is to reverse the string and replace the first.
        Boris
Re: regexp find last word
by borisz (Canon) on Mar 02, 2005 at 14:49 UTC
    Or eat more chars at the start.
    $_ = "fox comes and fox goes into forest"; s/(.*)(fox.+?forest)/$1the $2/; print;
    Boris
      Basically the same principle as my answer, but slightly slower.
      use Benchmark; print "holli\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +fox.+)(fox.+?forest)/${1}the $2/; }); print "borisz\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +.*)(fox.+?forest)/$1the $2/; }); print "sh1tn\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +?<=fox)(.+?)(fox)/$1the $2/; }); print "Roy Jonhson\n"; timethis (999999, sub { $_ = "fox comes and fox goes into forest"; s/( +?=fox(?:(?!.*fox).)*forest)/the /; });
      prints:
      holli timethis 999999: 10 wallclock secs ( 9.87 usr + 0.00 sys = 9.87 CPU) + @ 101275.98/s (n=999999) borisz timethis 999999: 14 wallclock secs (11.08 usr + 0.00 sys = 11.08 CPU) + @ 90268.91/s (n=999999) sh1tn timethis 999999: 12 wallclock secs (10.41 usr + 0.01 sys = 10.42 CPU) + @ 95959.98/s (n=999999) Roy Jonhson timethis 999999: 10 wallclock secs (10.19 usr + 0.02 sys = 10.20 CPU) + @ 98000.69/s (n=999999)
      /me wins ;-)


      holli, /regexed monk/

        That's all well for the speed, but let's check correctness:

        Testing string: fox comes and fox goes into forest

        Holli : fox comes and the fox goes into forest Roy : fox comes and the fox goes into forest Borisz : fox comes and the fox goes into forest shltn : fox comes and the fox goes into forest

        Every solution got that one right.

        Testing string: fox comes fox walks and fox goes into forest

        Holli : fox comes fox walks and the fox goes into forest Roy : fox comes fox walks and the fox goes into forest Borisz : fox comes fox walks and the fox goes into forest shltn : fox comes the fox walks and fox goes into forest

        Three foxes trip up shltn.

        Testing string: pig comes and fox goes into forest

        Holli : pig comes and fox goes into forest Roy : pig comes and the fox goes into forest Borisz : pig comes and the fox goes into forest shltn : pig comes and fox goes into forest

        And just one fox trips up both Holli and shltn

        So I'm just going to pretend I have judging power and disqualify Holli's and shltn's entries, which makes Roy Jonhson the new winner. :)

        With corrected versions of yours and mine, plus ikegami's suggested alteration of mine:
        use strict; use warnings; use Benchmark 'cmpthese'; cmpthese( -2, { holli => sub { $_ = "fox comes and fox goes into forest"; s/(fox.+)?(fox.+?forest)/${1}the $2/; }, Roy => sub { $_ = "fox comes and fox goes into forest"; s/(?=fox(?:(?!fox).)*forest)/the /; }, ikegami => sub { $_ = "fox comes and fox goes into forest"; s/(fox(?:(?!fox).)*forest)/the $1/; }, });
        Rate ikegami holli Roy ikegami 34714/s -- -2% -27% holli 35541/s 2% -- -25% Roy 47659/s 37% 34% --

        Caution: Contents may have been coded under pressure.

        You can remove the (redundant) .* from Roy Johnson's.

        I don't know if it helps any, but you can remove the lookahead too:
        s/(fox(?:(?!fox).)*forest)/the $1/;

        Here is what I mensure on my PPC:
        use strict; use warnings; use Benchmark 'cmpthese'; cmpthese( -2, { holli => sub { $_ = "fox comes and fox goes into forest"; s/(fox.+)?(fox.+?forest)/${1}the $2/; }, Roy => sub { $_ = "fox comes and fox goes into forest"; s/(?=fox(?:(?!fox).)*forest)/the /; }, ikegami => sub { $_ = "fox comes and fox goes into forest"; s/(fox(?:(?!fox).)*forest)/the $1/; }, borisz => sub { $_ = reverse "fox comes and fox goes into forest"; s/(tserof.*?)xof/${1}xof eht/; $_ = reverse; } } ); __OUTPUT__ Rate holli ikegami Roy borisz holli 62934/s -- -3% -25% -31% ikegami 65121/s 3% -- -23% -29% Roy 84099/s 34% 29% -- -8% borisz 91530/s 45% 41% 9% --
Re: regexp find last word
by holli (Abbot) on Mar 02, 2005 at 14:33 UTC
    s/(fox.+)(fox.+?forest)/${1}the $2/;
    Your regex matches at the earliest possible position. That weighs more than the laziness caused by the ?. Therefore you need to force it behind the first "fox".


    holli, /regexed monk/
      You should ?-quantify the first parenthesized expression, so it works for one fox as well:
      s/(fox.+)?(fox.+?forest)/${1}the $2/

      Caution: Contents may have been coded under pressure.
      What happens in case of
      fox comes and the fox goes into forest
      The code given above will put a the in front of every last fox irrespective of if a the was already there or not.

      while(<DATA>) { s/(?<!the )fox(?!.*fox)/the fox/ ; print; } __DATA__ the fox comes and fox goes into forest fox comes and fox goes into forest the fox comes and the fox goes into forest fox comes and the fox goes into forest


      Manav
      That works, but it will not work if you have two foxes and two forests and you want to slap a /g on it.
Re: regexp find last word
by Roy Johnson (Monsignor) on Mar 02, 2005 at 15:00 UTC
    Using lookahead, you avoid replacing things with themselves:
    s/(?=fox(?!.*fox.*forest).*forest)/the /;
    If the next thing you see is fox, and it is not followed by some string containing fox followed later by forest, and it is followed by forest, then stick in "the ".

    Alternatively:

    s/(?=fox(?:(?!fox).)*forest)/the /;
    If the next thing you see is fox, and it is followed by a sequence of characters, none of which starts another fox, and then you see forest, stick in "the ".

    Caution: Contents may have been coded under pressure.

      or s/(fox(?:(?!fox).)*forest)/the $1/;

      Update: nm, capture makes it slower.

Re: regexp find last word
by sh1tn (Priest) on Mar 02, 2005 at 14:50 UTC
    Possible positive lookbehind:
    $_ = "fox comes and fox goes into forest"; #s/(.+)(fox)/$1 the $2/; #s/(?<=and)(?:\s+fox)/ the/; s/(?<=fox)(.+?)(fox)/$1 the $2/;


Re: regexp find last word
by Anonymous Monk on Mar 02, 2005 at 14:57 UTC
    s/(fox[^f]*(?:f(?!ox)[^f]*)*forest)/the $1/
Re: regexp find last word
by artist (Parson) on Mar 02, 2005 at 15:05 UTC
    A generic version.
    $_ = "fox comes and fox goes into forest"; s!(\b(\w+)\b.*?)\2!$1the $2!; print $_;
      That puts 'the' in front the first word that appears twice. Which doesn't need to be 'fox', nor does it has anything to do with the last fox before the forest. It would fail on:
      "fox goes into forest"
Re: regexp find last word
by gube (Parson) on Mar 02, 2005 at 15:01 UTC

    Hi Try this,

    $a = "fox comes and fox goes into forest"; $a =~ s#(fox.*?)(fox.*?forest)#$1the $2#gsi; print $a;
      Considering the regex doesn't contain 'forest', how on earth would it find the last fox before the forest? Your regex just puts 'the' in front of the second fox in the sentence. Which may work in the example, but doesn't solve the problem as stated.
Re: regexp find last word
by chas (Priest) on Mar 02, 2005 at 15:15 UTC
    $_ = "fox comes and fox goes into forest"; s/(.*)(fox)/$1the $2/; print;
    chas
    (Update: I didn't see all the other responses till I had posted or I wouldn't have posted.)
      That puts 'the' in front of the last fox in the line, which doesn't have to be the last fox before the forest.

      Amazing. Such a relatively simple question, and an incredibly number of wrong answers posted - even after valid answers have been posted as well. Makes you wonder how useful Perlmonks is for people trying to learn Perl. (Not very)

        Yes, you're correct, and that did occur to me when I wrote the code. However, in the present case the last "fox" is the same as the last "fox" before "forest". Of course, if you put more "fox"s after "forest", then that changes things, but also suppose there are several "forests" and "fox"s. Then do we want every "fox" that is the last before *some* "forest" or just the last such case, etc, etc. Without really exact requirements it's unclear. I guessed that the real question posed by the original poster had to do with finding the last "fox", and that the fact that he described it as the last "fox" "relative to forest" was coincidental. But looking at the original post again, I think your objection is justified.
        (Update: As far as Perl Monks being useful for learning, I find it extremely so. Sometimes the most interesting posts/replies involve some incorrect tries and then corrections. It is often quite useful to see some errors and realize just what went wrong in addition to some really slick solution.)
        chas

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://435836]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-26 07:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found