http://qs321.pair.com?node_id=812921


in reply to Re: Regex fun
in thread Regex fun

I think it's important to note that \1 is not a variable (which is why you can't use it outside of a regex);
But you can, sometimes, use it in the replacement part.
think it's important to note that \1 is not a variable (which is why you can't use it outside of a regex); the variable that contains the contents of the first capture group is $1, but that's empty until the capture has completed.
But in /([0-9]+){$1}/, the first capture is completed before the quantifier. So, that's not the reason.
For example, /\+32767.{32767}/ is rejected at compile time
Yes, but that's considered a bug. It's a restriction that should have been removed after the regexp engine was no longer recursive.
“Why, then,” you ask, “is something like /(.)\1/, which suffers from the same compilation problem, OK?”
That's not the same problem. {...} is one of the mini-languages inside regular expressions. Compare it with [...]. [\1] doesn't refer back to something else either.

But one can defer a subpattern. The syntax is (??{ }). This is what the OP wants, and this is what the OP ought to use.

Replies are listed 'Best First'.
Re^3: Regex fun
by JadeNB (Chaplain) on Dec 15, 2009 at 20:22 UTC
    But you can, sometimes, use it in the replacement part.
    Sure, but you're not supposed to: Warning on \1 Instead of $1.
    But in /([0-9]+){$1}/, the first capture is completed before the quantifier. So, that's not the reason.
    Sorry, I don't understand—not the reason for what?
    It's a restriction that should have been removed after the regexp engine was no longer recursive.
    Sorry, I don't understand this, either. Do you mean ‘re-entrant’? (UPDATE: Nope, just my internals-ignorance revealed. Thanks, ikegami!)
      Regarding the last point, the engine was re-engineered for 5.10. It used to use the C stack, so limits were imposed to prevent stack overflows. Now, the stack it uses is on the heap. The implementation moved away from a recursive model as part of the change.
      Sorry, I don't understand—not the reason for what?
      Quoting myself where I am quoting you:
      the variable that contains the contents of the first capture group is $1, but that's empty until the capture has completed.
      You're claiming $1 is "empty" until the the capture has completed. I'm pointing that the in the case of the OP, said first capture has completed.
      Do you mean ‘re-entrant’?
      No, I don't. The current regexp-engine isn't re-entrant.
        You're claiming $1 is "empty" until the the capture has completed. I'm pointing that the in the case of the OP, said first capture has completed.

        I guess that the quotes around ‘empty’ are to point out that, besides the unusual choice of word (in place of ‘undefined’), it's not true—sorry, I'll correct that.

        I agree that Hena's second solution doesn't suffer from the problem that I mentioned; but the post particularly asks for a single-regex solution, and I was just mentioning why the obvious substitute, /\+([0-9]+)[$bases]{$1}/, for the non-working regex /\+([0-9]+)[$bases]{\1}/, doesn't work. (Nobody suggested it anyway, so I guess it was pretty unclear what I was talking about.)

        No, I don't. The current regexp-engine isn't re-entrant.

        Yes, which is why I thought that the final word in “the regexp engine was no longer recursive” might be ‘re-entrant’. :-) (I don't know enough history to know whether it ever was re-entrant, so, for all I knew, the grammar was correct.) I was particularly confused because Perl 5.10 newly allows for recursive regexes, which I confused with the regex engine itself being recursive; but ikegami clarified.