in reply to (Ab)using the Regex Engine

#!/usr/bin/perl use strict; # use warnings; my $match = qr[([ab]+)([ab]+)]; my $str = 'aba'; $str =~ /^ $match $ (?{ print "1: $1-$2\n" }) [c] /x; $str =~ /^ $match $ (?{ print "2: $1-$2\n" }) (?!) /x; $str =~ /^ $match $ (??{ print "3: $1-$2\n"; qr[(?!)] }) /x; # The fact that this is a defined pattern probably means # there is a lesser chance of it being optimized away. $str =~ /^ $match $ (?{ print "4: $1-$2\n" }) (*FAIL) /x;

Replies are listed 'Best First'.
Re^2: (Ab)using the Regex Engine
by jo37 (Friar) on May 26, 2020 at 10:41 UTC

    A very good point that is ideed the ultimate answer to my question by guiding to Special Backtracking Control Verbs.

    It states that (*FAIL) can be used to force the engine into backtracking and that this is equivalent to (?!). So version 2 and yours are basically the same and both are guaranteed to work. The trickery from version 3 is not needed.

    So in the end it is "use" and not "abuse".



      I'd call it "abuse". My bet is this pattern of application is well-known and tolerated for the sake of critical mass of existing "cool examples of (ab)using re-engine", and therefore safe to use in the future :). Stand-alone (*F) is guaranteed to fail, there's no need to "force to backtrack" while staying in the same branch; and as there are no other branches in your example, the whole matching must have been optimized away. On the other hand, something like (?(?{CODE})(*F)), with CODE result depending on sub-matches so far, is legitimate use and another matter entirely, but not the case here.

      The impression is, aforementioned tolerance goes as far as injection of (*F) makes (but not always) engine fail to fail early, which is funny.

      my $match = qr[([ab]+)([ab]+)]; my $str = 'aba'; $str =~ /^ $match $ (?{ print "1: $1-$2\n" }) a /x; $str =~ /^ $match $ (?{ print "2: $1-$2\n" }) b /x; $str =~ /^ $match $ (?{ print "3: $1-$2\n" }) (*F) b /x; $str =~ /^ $match $ (?{ print "4: $1-$2\n" }) (*F) .. /x; __END__ 1: ab-a 1: a-ba 3: ab-a 3: a-ba

        Probably my statement in Re^2: (Ab)using the Regex Engine about "use" vs. "abuse" was unclear and I should have quoted the relevant section from perlre:

        (*FAIL) (*F) (*FAIL:arg)
        This pattern matches nothing and always fails. It can be used to force the engine to backtrack. It is equivalent to (?!), but easier to read. In fact, (?!) gets optimised into (*FAIL) internally. You can provide an argument so that if the match fails because of this FAIL directive the argument can be obtained from $REGERROR. It is probably useful only when combined with (?{}) or (??{}).
        My point was that I realized that
        (?{CODE})(?!) (?{CODE})(*F)
        are documented as equivalently forcing the engine to backtrack and are just what I was looking for. I don't call this "abuse", but YMMV.

        The example with a character class was just historical and the one with (??{CODE}) was a result of my own ignorance.