Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This is a great case for the \K assertion (update: forgot to mention that \K is new for 5.10 but available to "everyone" via Regexp::Keep by Jeff Pinyan who come up with the idea (I don't know if that will provide you the same efficiency though)). Not only is it easier, but it's also more efficient due to the optimizations of the regexp engine. The pattern would look like this:

The great part with this is that the engine can start looking for a literal (the dot) and avoid a lot of backtracking. The output of use re 'debug'; will visualize this.

With the look-behind pattern, you see there's a lot of backtracking going on, and the engine guesses a match at the beginning of the string (the string is "" in the examples below).

Compiling REx "(?<=[.])[^.]*$" Final program: 1: IFMATCH[-1] (7) 3: EXACT <.> (5) 5: SUCCEED (0) 6: TAIL (7) 7: STAR (19) 8: ANYOF[\0-\-/-\377{unicode_all}] (0) 19: EOL (20) 20: END (0) floating ""$ at 0..2147483647 (checking floating) minlen 0 Guessing start of match in sv for REx "(?<=[.])[^.]*$" against " +o" Found floating substr ""$ at offset 7... Guessed: match at offset 0 Matching REx "(?<=[.])[^.]*$" against "" 0 <> <> | 1:IFMATCH[-1](7) failed... 1 <x> <> | 1:IFMATCH[-1](7) 0 <> <> | 3: EXACT <.>(5) failed... failed... 2 <xy> <> | 1:IFMATCH[-1](7) 1 <x> <> | 3: EXACT <.>(5) failed... failed... 3 <xyz> <.foo> | 1:IFMATCH[-1](7) 2 <xy> <> | 3: EXACT <.>(5) failed... failed... 4 <xyz.> <foo> | 1:IFMATCH[-1](7) 3 <xyz> <.foo> | 3: EXACT <.>(5) 4 <xyz.> <foo> | 5: SUCCEED(0) subpattern success... 4 <xyz.> <foo> | 7:STAR(19) ANYOF[\0-\-/-\377{unicode_all}] can +match 3 times out of 2147483647... 7 <> <> | 19: EOL(20) 7 <> <> | 20: END(0) Match successful!
However, if we look at the \K pattern, get get this:
Compiling REx "\.\K[^.]*$" Final program: 1: EXACT <.> (3) 3: KEEPS (4) 4: STAR (16) 5: ANYOF[\0-\-/-\377{unicode_all}] (0) 16: EOL (17) 17: END (0) anchored "." at 0 floating ""$ at 1..2147483647 (checking anchored) mi +nlen 1 Guessing start of match in sv for REx "\.\K[^.]*$" against "" Found anchored substr "." at offset 3... Found floating substr ""$ at offset 7... Starting position does not contradict /^/m... Guessed: match at offset 3 Matching REx "\.\K[^.]*$" against ".foo" 3 <xyz> <.foo> | 1:EXACT <.>(3) 4 <xyz.> <foo> | 3:KEEPS(4) 4 <xyz.> <foo> | 4: STAR(16) ANYOF[\0-\-/-\377{unicode_all}] ca +n match 3 times out of 2147483647... 7 <> <> | 16: EOL(17) 7 <> <> | 17: END(0) Match successful!
That's nice. No backtracking.


In reply to Re: positive look behind regexp mystery (\K assertion) by lodin
in thread positive look behind regexp mystery by rovf

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2022-05-18 09:47 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (70 votes). Check out past polls.