in reply to check for power of a number with regex

There is no "plain" regex that correctly determines whether a string's length is a square, or one that determines whether a string's length is a power of some other fixed number. The corresponding languages are not regular, and these two examples are typical examples of standard homework problems in a first theory of computation course.

Thus under your constraints, the key to such a regex must be in clever use of backreferences. But it's not clear how backrefs be useful. Backrefs can only really handle linear relations in the length of strings, while exponentiation is highly non-linear.


  • Comment on Re: check for power of a number with regex

Replies are listed 'Best First'.
Re^2: check for power of a number with regex
by JavaFan (Canon) on Oct 26, 2009 at 08:29 UTC
    Because 5.10 regular expressions have named captures you can use as rules, languages not being regular doesn't mean they cannot be matched by a Perl regexp. With named captures and rules, Perl regexes can match any context-free grammar. With backreferences, even more.

    Now, I don't think the language {1n | n = bc, c > 1} is context-free, but that's harder to prove than it being non-regular. And it's still not sufficient to prove it cannot be matched by a Perl regexp without the use of (?{ }) or (??{ }).

      Any context-free language over a single-character alphabet is also regular (which can be shown using, for example, Parikh's theorem). So since this "language of exponentials" is non-regular, it is also non-context-free. Another way to look at it is that the pumping lemma for CFLs essentially collapses into the pumping lemma for regular languages when applied to single-character alphabets, because the concatenation operator is commutative on strings over a single-character alphabet.

      I do stand by my intuition that backrefs alone (i.e., classical regexes + backrefs) won't help. But you are right, I am not able to prove it -- there is just no formal model that I'm aware of that exactly captures the expressivity of those operations, that would be amenable to impossibility proofs. I admit I hadn't thought of the new named captures & rules from 5.10 (I'm a bit behind the times). Clearly the named captures alone won't get you the regex desired in this case, but maybe some clever combination of both named rules & backrefs? I remain slightly skeptical but open-minded ;)