http://qs321.pair.com?node_id=630352

isync has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks!

I know of the handy regex /([^:]+)/ which tells perl to eat everything until it reaches the ":".

Now, is there a way to do the same with a phrase, like /([^stop]+)/ ? I couldn't get it to work so far. My current state of knowledge is that it is not as easy as perl has to iterate back and forth to emulate phrase functionality and that's a whole different syntax (which I don't know either, btw)...

Replies are listed 'Best First'.
Re: Tell regex to stop at "phrase", instead of char - how?
by JediWizard (Deacon) on Aug 02, 2007 at 18:08 UTC

    See perlre

    m/(?:(?!somephrase).)*/;

    "A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar". Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind."


    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

Re: Tell regex to stop at "phrase", instead of char - how?
by BrowserUk (Patriarch) on Aug 02, 2007 at 21:10 UTC

    Maybe I'm missing something subtle here, but the answers so far seem too complicated.

    If you want to stop at the phrase 'stop', just embed that at the appropriate point in the regex and use an non-greedy 'anything' match:

    $s = "this is a load of junk to be consumed until we get to the word s +top. After that nothing should stop it until another stop";; print $1 while $s =~ m[(.+?)stop]g;; this is a load of junk to be consumed until we get to the word . After that nothing should it until another

    And if you want the stop phrase to be a part of (say) the next capture, only then do you need to wrap it in a ZLA:

    print $1 while $s =~ m[(.+?)(?=stop)]g;; this is a load of junk to be consumed until we get to the word stop. After that nothing should stop it until another

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That's fine so long as the regex ends at that point. Backtracking can easily cause problems if that pattern is part of some larger regex (or later becomes part of some larger regex).

      For example, if I want to find only sections of "start...stop" that actually end in "stop now" then non-greedy doesn't try hard enough to not be greedy:

      $_= "start this stop start that stop now start last stop"; print "non-greedy:\n"; print " ", $_, $/ for /start(.*?)stop now/g; print "look-ahead:\n"; print " ", $_, $/ for /start((?:(?!stop).)*)stop now/g; __END__ non-greedy: this stop start that look-ahead: that

      - tye        

Re: Tell regex to stop at "phrase", instead of char - how?
by pKai (Priest) on Aug 02, 2007 at 19:15 UTC
    ...regex /([^:]+)/ which tells perl to eat everything until it reaches the ":".

    ...unless the input begins with a colon, when the regex will start its eating i. e. matching after the leading colon(s).

    So the "exact" equivalent to /([^:]+)/ would be /((?:.(?!stop))+.)/, while the "eat up until" case might be something like /\G((?!stop)(?:.(?!stop))+.)/, corresponding /([^:]*)/, or so I hope.

Re: Tell regex to stop at "phrase", instead of char - how?
by Anonymous Monk on Aug 03, 2007 at 00:13 UTC
    Well I'm guessing that you want to remove everything before the phrase then something like this will do: s/.*?yourPhrase/yourPhrase/ The .* means match any character zero or more times. the ? means that the previous expression should be non-greedy which means that if you have the same phrase twice in the string it should match on the first. So the meaning of this substitution regexp is to substitute anything preceding and _including_ yourPhrase with the string yourPhrase and thereby removing the "anything preceding".
Re: Tell regex to stop at "phrase", instead of char - how?
by Anonymous Monk on Aug 02, 2007 at 23:41 UTC
    /([^(stop)]+)/ Not particularly efficient, but it works...

      No, it doesn't work. To work, it not only needs to match when it should match, it needs to not mach when it shouldn't match. Yours fails to do the latter.

      /([^(stop)]+)/ is the same as /([^()opst]+)/. It doesn't look for "stop" at all, just individual characters.

      You basically took the regexp the OP said didn't work, and made it not work in more places. For example, your solution stops wherever a paren is encountered.

Re: Tell regex to stop at "phrase", instead of char - how?
by naikonta (Curate) on Aug 03, 2007 at 13:51 UTC
    If it's really "get everything before a certain phrase" then you can also use split instead of regex.
    $_ = 'nothing can stop the bus on the road'; print +(split /stop/)[0], "\n"; print +(split /bus/)[0], "\n"; # output (with trailing space) nothing can nothing can stop the

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: Tell regex to stop at "phrase", instead of char - how?
by isync (Hermit) on Aug 03, 2007 at 09:25 UTC
    Thanks everyone! I especially like /([^(stop)]+)/ (is it really more inefficient than the other solutions..?)