Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Backreferences in negated character classes

by hv (Prior)
on Dec 21, 2005 at 10:55 UTC ( [id://518279]=note: print w/replies, xml ) Need Help??


in reply to Backreferences in negated character classes

The first problem is that character classes are constructed when the regexp is compiled, and do not change during the matching process. Because of that the special syntax for backreferences in regexps does not extend inside the character class, so as tye mentioned the '\2' is actually treated as ASCII character 2.

You could circumvent that by getting clever with deferred evals (which lets you create new regexps to be compiled while matching), but you don't want to do that - the negative lookahead is definitely the way to go.

All of the extended features in the regexp engine are of the form (?...), to avoid clashing with any previously valid syntax; the ($!\2) in your example actually interpolated the $! error variable into your regexp - presumably the empty string.

So a correct solution would look something like:

m{ (\w) \1 (?! \1) (\w) }x;
.. and to the extended example:
m{ (\w) \1 (?! \1) (\w) (?! \1 | \2) (\w) \3 \3 \3 \1 (?! \1 | \2 | \3) (\w) }x;

This gives you something nice and regular - it would be quite easy to write code to generate the above from the example string. Here's how it might work:

my $s = 'AABCCCCAD'; our $DEBUG = 1; print +($s =~ mkre($s)) ? "ok\n" : "fail\n"; sub mkre { my $s = shift; my $index = 0; my(%seen, @elems); for (split //, $s) { if ($seen{$_}) { push @elems, "\\$seen{$_}"; } else { push @elems, sprintf '(?! %s)', join ' | ', map "\\$_", 1 .. $in +dex if $index; $seen{$_} = ++$index; push @elems, '(\\w)'; } } my $re = join ' ', @elems; warn "$s: $re\n" if $DEBUG; qr/$re/x; }

Hugo

Replies are listed 'Best First'.
Re^2: Backreferences in negated character classes
by bobf (Monsignor) on Dec 21, 2005 at 15:22 UTC

    Many thanks for all of the responses. GrandFather, you were right on target as always; tye, thanks for cutting to the heart of the problem and pointing out a duh moment for me. :-)

    The first problem is that character classes are constructed when the regexp is compiled, and do not change during the matching process. Because of that the special syntax for backreferences in regexps does not extend inside the character class, so as tye mentioned the '\2' is actually treated as ASCII character 2.
    the ($!\2) in your example actually interpolated the $! error variable into your regexp

    Thank you for stating that so explicitly - that was a core piece of knowledge that I was missing. Now I understand why my incorrect negative lookahead was matching four characters. I can't believe I missed the obvious typo in the lookahead ($! instead of ?!. I guess that's what I get for playing with regexen so late at night. :-)

    This gives you something nice and regular - it would be quite easy to write code to generate the above from the example string. Here's how it might work:

    Thanks for the great example for building this type of regex on the fly. I wanted to capture the whole match, so I changed it as follows:

    my $regex = mkre($s); while( $string =~ m/$regex/g ) { print $1, "\n"; # do other stuff } sub mkre { my $s = shift; my $index = 1; # using \1 to capture the whole match my(%seen, @elems); for (split //, $s) { if ($seen{$_}) { push @elems, "\\$seen{$_}"; } else { push @elems, sprintf '(?! %s)', join ' | ', map "\\$_", 2 .. $in +dex if $index > 1; # changed to start with \2 $seen{$_} = ++$index; push @elems, '(\\w)'; } } my $re = join( ' ', '(', @elems, ')' ); # create \1 warn "$s: $re\n" if $DEBUG; qr/$re/x; }

    Then I realized I could have left the sub as-is and just printed $& instead. :-)

    Thanks again for the help, and for such a elegant solution.

    Update: japhy++ Very nice solution - taking that approach would enable me to create much more flexible (and more powerful) regexps. Thanks for posting it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://518279]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-04-16 05:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found