Substitute for variable-length look-behind?

diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

use Term::ANSIColor ':constants';

# Note that there is not normally a space betweeh the </ and code>. I 
+added that so that Perlmonks.org wouldn't parse the post incorrectly.
$_ = "normal <code> green <!-- yellow [red] normal --> normal </ code>
+ normal";
$GREEN = GREEN;
$YELLOW = YELLOW;
$RED = RED;
$RESET = RESET;
s(<code>(.+?)</ code>)($GREEN$1$RESET)g;
s((?<=\x1b)\[(.+?)\])($RED$1$RESET)g;
s((<!--.+?-->))($YELLOW$1$RESET)g;
[download]

The preceding program produces the following string. Overlapping is not detected and each markup element terminates the enclosing element prematurely.

"normal " . GREEN . " green " . YELLOW . "" . RESET . " normal" . RESET . " normal"

I'd like it to produce this. If perl had variable length look-behind I'd write this as (?<=(?:^|$RESET)[^\x1b]*). Any ideas for how to write this without using perl's experimental regexp features ( (?{ code }), (??{ rx }), and (?(expr)true|false))?

"normal " . GREEN . " green  normal" . RESET . " normal"

Comment on Substitute for variable-length look-behind? Select or Download Code

Replies are listed 'Best First'.
Re: Substitute for variable-length look-behind? by particle (Vicar) on Jan 26, 2004 at 18:15 UTC
reverse the string and use a variable-width look-ahead assertion. ~Particle accelerates	[reply]
Re: Re: Substitute for variable-length look-behind? by belg4mit (Prior) on Jan 26, 2004 at 21:56 UTC
aka sexeger ... my thoughts exactly. `-- I'm not belgian but I play one on TV.`	[reply]
Re: Substitute for variable-length look-behind? by Roy Johnson (Monsignor) on Jan 26, 2004 at 19:35 UTC
Here's another solution, somewhat more elegant than my previous one. `s{<code>(.+?)</ code>\|\[(.+?)\]\|(<!--.+?-->)} {defined $1 ? "$GREEN$1$RESET" : defined $2 ? "$RED$2$RESET" : defined $3 ? "$YELLOW$3$RESET" : warn "Broken with $+\n" }ge;` [download] The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: Substitute for variable-length look-behind? by diotalevi (Canon) on Jan 26, 2004 at 21:13 UTC
I ran with a combination of Roy, tye, and Zaxo. s///e to get multiple lvalues into a string (one lvalue per line in the string), then Roy's idea to walk pos() in that line followed up with some modifications to substr() lvalues. `s(^(.+?\Q$RESET \| \E)(.+)){ my $header = $1; my $comment = $2; $comment =~ s((?: <code> (.?) </ code> \| \[ (.?) \] \| (<!-- .*? -->) )){ ( ( defined( $1 ) && ( GREEN . $1 ) ) \|\| ( defined( $2 ) && ( RED . $2 ) ) \|\| ( defined( $3 ) && ( YELLOW . $3 ) ) ) . RESET }gex; "$header$comment"; }meg;` [download]	[reply] [d/l]
Re: Substitute for variable-length look-behind? by bart (Canon) on Jan 26, 2004 at 20:30 UTC
You must take care that your lookbehind, if incorporated into the same regexp, doesn't change what you match. If it's too greedy, the starting point of the former pattern might shift backwards. So, IMO, the safest way is to use two patterns. I'm not sure if you can integrate it into one pattern, I doubt it, so that you can still simply make the whole match fail if the lookbehind fails. Two independent matches won't do that. Anyway, enough blahblah, here's my coarse idea: `my $success; while(/PATTERN/g) { if(substr($_, 0, $-[0]) =~ /LOOKBEHIND\z/) { # got a match! $success = 1; last; } }` [download] For example: `$_ = 'bar bar obar foooooobar bar'; my($success, $start) = 0; while(/bar/g) { if(substr($_, 0, $start = $-[0]) =~ /fo+\z/) { # got a match! $success = 1; last; } } print "$success: $start\n";` [download] printing: 1: 35 Using `$start`, the start position of the outer match, you can try again if you want, to get the captured values and @- and @+. For some odd reason, capturing @- and @+ in the loop made it loop forever. shrug	[reply] [d/l] [select]
Re: Substitute for variable-length look-behind? by Roy Johnson (Monsignor) on Jan 26, 2004 at 18:25 UTC
Clunky, but may be instructional. `while (m(<code>\|\[\|<!--)g) { if ($& eq '<code>') { s{\G(.+?)</ code>}{$GREEN$1$RESET}; pos($_) += length("$GREEN$1$RESET"); } elsif ($& eq '[') { s{\G(.+?)\]}{$RED$1$RESET}; pos($_) += length("$RED$1$RESET"); } elsif ($& eq '<!--') { s{\G(.+?)-->}{$YELLOW<!--$1$-->RESET}; pos($_) += length("$YELLOW$1$RESET"); } }` [download] The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: Substitute for variable-length look-behind? by sleepingsquirrel (Chaplain) on Jan 26, 2004 at 19:18 UTC
This snippet might work for you. I'm assuming you want no nesting of tags and the first (left most) one wins. `use Term::ANSIColor ':constants'; # Note that there is not normally an 'x' betweeh the </ and code>. I # added that so that Perlmonks.org wouldn't parse the post incorrectly +. $_ = "normal <code> green <!-- yellow [red] normal --> normal </xcode> ++ normal"; $delimit{"code"} = GREEN; $delimit{"--"} = YELLOW; $delimit{"]"} = RED; $RESET = RESET; s{ (?:<code>(.+?)</x(code)>) \| (?:(?<=\x1b)\[(.+?)(\])) \| (?:(<!--.+?(--)>)) }{$delimit{$2\|$4\|$6}.($1\|$3\|$5).$RESET}egx; print "$_\n";` [download]	[reply] [d/l]
Re: Substitute for variable-length look-behind? by Abigail-II (Bishop) on Jan 26, 2004 at 17:48 UTC
Any ideas for how to write this without using perl's experimental regexp features Use a parser. Abigail	[reply]


Perl Monk, Perl Meditation
	PerlMonks