http://qs321.pair.com?node_id=500475

diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

In the regexp /(?{ ... })XXX/, I would like to ensure that the code block is run. When that pattern is applied to " failed ", the optimizer quickly notes that XXX is a required literal and doesn't appear in the string. My code block never fires. Can I force the issue somehow?

" failed " =~ /(?{ $ok = 1 })XXX/; print $ok ? "Good" : "Bad";

Compiling REx `(?{ $ok = 1 })XXX' size 5 Got 44 bytes for offset annotations. first at 1 1: EVAL(3) 3: EXACT <XXX>(5) 5: END(0) anchored `XXX' at 0 (checking anchored) minlen 3 with eval Offsets: [5] 1[14] 0[0] 15[3] 0[0] 18[0]
Guessing start of match, REx `(?{ $ok = 1 })XXX' against ` failed '... Did not find anchored substr `XXX'... Match rejected by optimizer

Replies are listed 'Best First'.
Re: Disabling regexp optimizations?
by japhy (Canon) on Oct 15, 2005 at 18:24 UTC
    I might be responsible for that optimization. I made some modifications to the regex engine a couple years ago involving skipping non-pattern parts of regexes when looking to optimize. So, how can can you defeat it? Well, you could wrap the 'XXX' inside a (??{ ... }).
    $rx = qr/.../; " failed " =~ /(?{ $ok = 1 })(??{ $rx })/; print "OK = $ok\n";

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

      Even (??{ "XXX" }) is enough for that it seems. Thanks.

Re: Disabling regexp optimizations?
by graff (Chancellor) on Oct 15, 2005 at 18:44 UTC
    I haven't dipped my toe into this deeper end of the regex pool yet, so I'm curious about the motivation behind your question. If I understand it right, you want an expression, which you are placing within  (?{...}), to execute every time you evaluate this regex, regardless of whether or not the remainder of the regex yields an actual match.

    If that is the correct understanding, why would you want to do it that way, as opposed to this way:

    your_executable_expression; /XXX/
    That is, since this expression should execute in any case, just go ahead and do it outside the regex, then evaluate the (simpler) regex.

    If I have the wrong understanding, could someone explain what you get from  /(?{...}).../ (without the optimization, as requested here) that you wouldn't get from running that statement outside the regex? (I just don't know.)

      The code in (?{...}) might have a desired side effect. Usually you use (?{...}) blocks only for their side effects because they don't normally impact whether the expression matches. In my case, I've implemented named captures and would like to pre-clear the targets just in case my capture expressions never get evaluated because something earlier failed. That is, CLEAR->stuff->CAPTURE. If "stuff" failed, then I wouldn't have had the opportunity to ensure that CAPTURE properly cleared the selected target.

        I like your idea, but... It is not how (anonymous) captures in Perl work, do they? I mean, nothing is guaranteed about the value of $1, $2 etc. in a regexp with captures, if the regexp doesn't match.