positive look behind regexp mystery

rovf has asked for the wisdom of the Perl Monks concerning the following question:

I wanted to substitute the extension of a filename by 'txt', that is to transform abc/xyz.something into abc/xyz.txt. This is easy, but maybe driven by the sudden thought that I'm becoming an old man as the years passed by without ever having tried a positive look behind regexp, I came up with the following silly solution:

use strict;
use warnings;
my $fn="abc/def.xyz";
$fn =~ s/($<=[.])[^.]*$/txt/;
print "$fn\n";
[download]

That is, substitute the longest string at the end of the filename which does not contain a period, but is preceded with one. Interestingly, this did not work - no substitution was taking place.

In my case this is overkill, because I happen to know in my filename that there *is* a period, so I could have much easier written

$fn =~ s/[^.]*$/txt/;
[download]

(this would however replace the complete filename with txt if it doesn't contain a period).

Nevertheless I would like to know *why* my original solution has failed. Any suggestions?

--
Ronald Fischer <ynnor@mm.st>

Comment on positive look behind regexp mystery Select or Download Code

Replies are listed 'Best First'.
Re: positive look behind regexp mystery by moritz (Cardinal) on Aug 01, 2008 at 08:41 UTC
The syntax for look-behinds is `(?<=...)`, not `($<=...)`.	[reply] [d/l] [select]
Re^2: positive look behind regexp mystery by rovf (Priest) on Aug 01, 2008 at 09:55 UTC
The syntax for look-behinds is (?<=...), not ($<=...) :-O I definitely wanted to type a '?', and when I was looking at the code, my brain was always telling me that there is a question mark! I simply did not see the typo!!!! Thanks a lot! -- Ronald Fischer <ynnor@mm.st>	[reply]
Re: positive look behind regexp mystery (\K assertion) by lodin (Hermit) on Aug 14, 2008 at 12:54 UTC
This is a great case for the `\K` assertion (update: forgot to mention that `\K` is new for 5.10 but available to "everyone" via Regexp::Keep by Jeff Pinyan who come up with the idea (I don't know if that will provide you the same efficiency though)). Not only is it easier, but it's also more efficient due to the optimizations of the regexp engine. The pattern would look like this: `s/\.\K[^.]$/txt/;` [download] The great part with this is that the engine can start looking for a literal (the dot) and avoid a lot of backtracking. The output of `use re 'debug';` will visualize this. With the look-behind pattern, you see there's a lot of backtracking going on, and the engine guesses a match at the beginning of the string (the string is "xyz.foo" in the examples below). Compiling REx "(?<=[.])[^.]$" Final program: 1: IFMATCH[-1] (7) 3: EXACT <.> (5) 5: SUCCEED (0) 6: TAIL (7) 7: STAR (19) 8: ANYOF[\0-\-/-\377{unicode_all}] (0) 19: EOL (20) 20: END (0) floating ""$ at 0..2147483647 (checking floating) minlen 0 Guessing start of match in sv for REx "(?<=[.])[^.]$" against "xyz.fo +o" Found floating substr ""$ at offset 7... Guessed: match at offset 0 Matching REx "(?<=[.])[^.]$" against "xyz.foo" 0 <> <xyz.foo> \| 1:IFMATCH[-1](7) failed... 1 <x> <yz.foo> \| 1:IFMATCH[-1](7) 0 <> <xyz.foo> \| 3: EXACT <.>(5) failed... failed... 2 <xy> <z.foo> \| 1:IFMATCH[-1](7) 1 <x> <yz.foo> \| 3: EXACT <.>(5) failed... failed... 3 <xyz> <.foo> \| 1:IFMATCH[-1](7) 2 <xy> <z.foo> \| 3: EXACT <.>(5) failed... failed... 4 <xyz.> <foo> \| 1:IFMATCH[-1](7) 3 <xyz> <.foo> \| 3: EXACT <.>(5) 4 <xyz.> <foo> \| 5: SUCCEED(0) subpattern success... 4 <xyz.> <foo> \| 7:STAR(19) ANYOF[\0-\-/-\377{unicode_all}] can +match 3 times out of 2147483647... 7 <xyz.foo> <> \| 19: EOL(20) 7 <xyz.foo> <> \| 20: END(0) Match successful! [download] However, if we look at the `\K` pattern, get get this: Compiling REx "\.\K[^.]$" Final program: 1: EXACT <.> (3) 3: KEEPS (4) 4: STAR (16) 5: ANYOF[\0-\-/-\377{unicode_all}] (0) 16: EOL (17) 17: END (0) anchored "." at 0 floating ""$ at 1..2147483647 (checking anchored) mi +nlen 1 Guessing start of match in sv for REx "\.\K[^.]$" against "xyz.foo" Found anchored substr "." at offset 3... Found floating substr ""$ at offset 7... Starting position does not contradict /^/m... Guessed: match at offset 3 Matching REx "\.\K[^.]$" against ".foo" 3 <xyz> <.foo> \| 1:EXACT <.>(3) 4 <xyz.> <foo> \| 3:KEEPS(4) 4 <xyz.> <foo> \| 4: STAR(16) ANYOF[\0-\-/-\377{unicode_all}] ca +n match 3 times out of 2147483647... 7 <xyz.foo> <> \| 16: EOL(17) 7 <xyz.foo> <> \| 17: END(0) Match successful! [download] That's nice. No backtracking. lodin*	[reply] [d/l] [select]
Re^2: positive look behind regexp mystery (\K assertion) by rovf (Priest) on Aug 14, 2008 at 13:28 UTC
This is a great case for the \K assertion. I have never heard of \K and can't find it in perlre. Is this a very new feature? -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re^3: positive look behind regexp mystery (\K assertion) by lidden (Curate) on Aug 14, 2008 at 14:31 UTC
It was added in 5.10.0.	[reply]
Re^3: positive look behind regexp mystery (\K assertion) by massa (Hermit) on Aug 14, 2008 at 14:42 UTC
The doc you linked to (perlre) has tons of references to \K. :-) You, like me, must be still at perl5.8, and we don't have it in our docs. But if you are in 5.10, then read again :-) []s, HTH, Massa (κς,πμ,πλ)	[reply]
Re^4: positive look behind regexp mystery (\K assertion) by rovf (Priest) on Aug 19, 2008 at 08:27 UTC
Re: positive look behind regexp mystery by BrowserUk (Patriarch) on Aug 01, 2008 at 08:48 UTC
S'funny. It works for me? `$s = 'abc.def'; $s =~ s[(?<=[.])[^.]*$][xyz]; print $s;; abc.xyz` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: positive look behind regexp mystery by Anonymous Monk on Aug 01, 2008 at 08:32 UTC
see `use re 'debug';`	[reply] [d/l]

Back to Seekers of Perl Wisdom