This is a great case for the \K assertion (update: forgot to mention that \K is new for 5.10 but available to "everyone" via Regexp::Keep by Jeff Pinyan who come up with the idea (I don't know if that will provide you the same efficiency though)). Not only is it easier, but it's also more efficient due to the optimizations of the regexp engine. The pattern would look like this:
s/\.\K[^.]*$/txt/;
The great part with this is that the engine can start looking for a literal (the dot) and avoid a lot of backtracking. The output of
use re 'debug'; will visualize this.
With the look-behind pattern, you see there's a lot of backtracking going on, and the engine guesses a match at the beginning of the string (the string is "xyz.foo" in the examples below).
Compiling REx "(?<=[.])[^.]*$"
Final program:
1: IFMATCH[-1] (7)
3: EXACT <.> (5)
5: SUCCEED (0)
6: TAIL (7)
7: STAR (19)
8: ANYOF[\0-\-/-\377{unicode_all}] (0)
19: EOL (20)
20: END (0)
floating ""$ at 0..2147483647 (checking floating) minlen 0
Guessing start of match in sv for REx "(?<=[.])[^.]*$" against "xyz.fo
+o"
Found floating substr ""$ at offset 7...
Guessed: match at offset 0
Matching REx "(?<=[.])[^.]*$" against "xyz.foo"
0 <> <xyz.foo> | 1:IFMATCH[-1](7)
failed...
1 <x> <yz.foo> | 1:IFMATCH[-1](7)
0 <> <xyz.foo> | 3: EXACT <.>(5)
failed...
failed...
2 <xy> <z.foo> | 1:IFMATCH[-1](7)
1 <x> <yz.foo> | 3: EXACT <.>(5)
failed...
failed...
3 <xyz> <.foo> | 1:IFMATCH[-1](7)
2 <xy> <z.foo> | 3: EXACT <.>(5)
failed...
failed...
4 <xyz.> <foo> | 1:IFMATCH[-1](7)
3 <xyz> <.foo> | 3: EXACT <.>(5)
4 <xyz.> <foo> | 5: SUCCEED(0)
subpattern success...
4 <xyz.> <foo> | 7:STAR(19)
ANYOF[\0-\-/-\377{unicode_all}] can
+match 3 times out of 2147483647...
7 <xyz.foo> <> | 19: EOL(20)
7 <xyz.foo> <> | 20: END(0)
Match successful!
However, if we look at the
\K pattern, get get this:
Compiling REx "\.\K[^.]*$"
Final program:
1: EXACT <.> (3)
3: KEEPS (4)
4: STAR (16)
5: ANYOF[\0-\-/-\377{unicode_all}] (0)
16: EOL (17)
17: END (0)
anchored "." at 0 floating ""$ at 1..2147483647 (checking anchored) mi
+nlen 1
Guessing start of match in sv for REx "\.\K[^.]*$" against "xyz.foo"
Found anchored substr "." at offset 3...
Found floating substr ""$ at offset 7...
Starting position does not contradict /^/m...
Guessed: match at offset 3
Matching REx "\.\K[^.]*$" against ".foo"
3 <xyz> <.foo> | 1:EXACT <.>(3)
4 <xyz.> <foo> | 3:KEEPS(4)
4 <xyz.> <foo> | 4: STAR(16)
ANYOF[\0-\-/-\377{unicode_all}] ca
+n match 3 times out of 2147483647...
7 <xyz.foo> <> | 16: EOL(17)
7 <xyz.foo> <> | 17: END(0)
Match successful!
That's nice. No backtracking.
lodin
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.