Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: positive look behind regexp mystery (\K assertion)

by lodin (Hermit)
on Aug 14, 2008 at 12:54 UTC ( #704341=note: print w/replies, xml ) Need Help??


in reply to positive look behind regexp mystery

This is a great case for the \K assertion (update: forgot to mention that \K is new for 5.10 but available to "everyone" via Regexp::Keep by Jeff Pinyan who come up with the idea (I don't know if that will provide you the same efficiency though)). Not only is it easier, but it's also more efficient due to the optimizations of the regexp engine. The pattern would look like this:

s/\.\K[^.]*$/txt/;
The great part with this is that the engine can start looking for a literal (the dot) and avoid a lot of backtracking. The output of use re 'debug'; will visualize this.

With the look-behind pattern, you see there's a lot of backtracking going on, and the engine guesses a match at the beginning of the string (the string is "xyz.foo" in the examples below).

Compiling REx "(?<=[.])[^.]*$" Final program: 1: IFMATCH[-1] (7) 3: EXACT <.> (5) 5: SUCCEED (0) 6: TAIL (7) 7: STAR (19) 8: ANYOF[\0-\-/-\377{unicode_all}] (0) 19: EOL (20) 20: END (0) floating ""$ at 0..2147483647 (checking floating) minlen 0 Guessing start of match in sv for REx "(?<=[.])[^.]*$" against "xyz.fo +o" Found floating substr ""$ at offset 7... Guessed: match at offset 0 Matching REx "(?<=[.])[^.]*$" against "xyz.foo" 0 <> <xyz.foo> | 1:IFMATCH[-1](7) failed... 1 <x> <yz.foo> | 1:IFMATCH[-1](7) 0 <> <xyz.foo> | 3: EXACT <.>(5) failed... failed... 2 <xy> <z.foo> | 1:IFMATCH[-1](7) 1 <x> <yz.foo> | 3: EXACT <.>(5) failed... failed... 3 <xyz> <.foo> | 1:IFMATCH[-1](7) 2 <xy> <z.foo> | 3: EXACT <.>(5) failed... failed... 4 <xyz.> <foo> | 1:IFMATCH[-1](7) 3 <xyz> <.foo> | 3: EXACT <.>(5) 4 <xyz.> <foo> | 5: SUCCEED(0) subpattern success... 4 <xyz.> <foo> | 7:STAR(19) ANYOF[\0-\-/-\377{unicode_all}] can +match 3 times out of 2147483647... 7 <xyz.foo> <> | 19: EOL(20) 7 <xyz.foo> <> | 20: END(0) Match successful!
However, if we look at the \K pattern, get get this:
Compiling REx "\.\K[^.]*$" Final program: 1: EXACT <.> (3) 3: KEEPS (4) 4: STAR (16) 5: ANYOF[\0-\-/-\377{unicode_all}] (0) 16: EOL (17) 17: END (0) anchored "." at 0 floating ""$ at 1..2147483647 (checking anchored) mi +nlen 1 Guessing start of match in sv for REx "\.\K[^.]*$" against "xyz.foo" Found anchored substr "." at offset 3... Found floating substr ""$ at offset 7... Starting position does not contradict /^/m... Guessed: match at offset 3 Matching REx "\.\K[^.]*$" against ".foo" 3 <xyz> <.foo> | 1:EXACT <.>(3) 4 <xyz.> <foo> | 3:KEEPS(4) 4 <xyz.> <foo> | 4: STAR(16) ANYOF[\0-\-/-\377{unicode_all}] ca +n match 3 times out of 2147483647... 7 <xyz.foo> <> | 16: EOL(17) 7 <xyz.foo> <> | 17: END(0) Match successful!
That's nice. No backtracking.

lodin

Replies are listed 'Best First'.
Re^2: positive look behind regexp mystery (\K assertion)
by rovf (Priest) on Aug 14, 2008 at 13:28 UTC
    This is a great case for the \K assertion.

    I have never heard of \K and can't find it in perlre. Is this a very new feature?

    -- 
    Ronald Fischer <ynnor@mm.st>
      It was added in 5.10.0.
      The doc you linked to (perlre) has tons of references to \K. :-) You, like me, must be still at perl5.8, and we don't have it in our docs. But if you are in 5.10, then read again :-)
      []s, HTH, Massa (κς,πμ,πλ)
        You, like me, must be still at perl5.8

        Correct - the project I'm working on, requires compatibility to Perl 5.8.7, so I did not check the 5.10 docs. Thanks for pointing this out.

        -- 
        Ronald Fischer <ynnor@mm.st>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://704341]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2022-01-23 08:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (63 votes). Check out past polls.

    Notices?