http://qs321.pair.com?node_id=554443

kidongrok has asked for the wisdom of the Perl Monks concerning the following question:

DB<1> $a="abcdabcd" DB<2> p $a =~ /[0-9a-f]{4|8}/ DB<3> p $a =~ /[0-9a-f]{4}/ 1 DB<4> p $a =~ /[0-9a-f]{8}/ 1

NB-the display is eating the [] square brackets up there.

lots can follow from this:
- why no error on the 1st line
- am I really the 1st ? ;-)
- whats it mean currently ? (this I can answer)

1: BRANCH(15) 2: ANYOF[0-9a-f](13) 13: EXACT <{4>(18) 15: BRANCH(18) 16: EXACT <8}>(18) 18: END(0)

thats not what I expected, wouldnt you tacitly expect perl to do something similar to what it does with {4,8} there ? or is that ambiguous ?

given the availability of \, which is needed to get literal reading of []() chars, whats the harm ?

(He baits the hook, chum the water... Dont we have some regex-monsters lurking here ?

Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: what means this regex? $x = qr/[0-9a-f]{4|8}/
by reasonablekeith (Deacon) on Jun 09, 2006 at 08:38 UTC
    Regarding
    {4|8}
    this is a nice idea, but doesn't do anything special. a curly bracket is only a special character when it is found in one of these forms {n}, {n,} or {n,m}.

    As your example isn't like this, the bracket is just matched as a plain character. What your first regex shows is just an alternation, equivalent to the following...

    if (/[0-9a-f]\{4/ or /8\}/} { print "matched\n"; }
    Note that I've escaped the curly bracket just to be explict, it's not actually necessary
    ---
    my name's not Keith, and I'm not reasonable.
Re: what means this regex? $x = qr/[0-9a-f]{4|8}/
by Zaxo (Archbishop) on Jun 09, 2006 at 08:30 UTC

    Did you expect to get bitwise-or evaluated to match twelve characters? Or did you expect regex alternation to get it to match exactly four or exactly eight?

    I think the regex engine gave up on compiling a quantifier when it hit incompatible syntax. That would leave the braces as literal.

    After Compline,
    Zaxo

      That was a poorly worded Q you responded to. To be clear, it was a muse, rather than write "4 or 8 hex chars" other longer ways. You saw a 2nd interpretation that I had dismissed:
    • Doing bitwize-or: {4|8} = {12}; gives only 1 match-length, which is already doable '{12}'
    • Doing match-length-alternation, insofar as it allows multiple lengths to be given, seems more useful.

        If alternation worked in quantifiers, you'd want to put the eight first. The regex engine may be greedy, but it's also hasty. As soon as it matches an alternate it forgets about the remaining ones. Anything that would match the eight has already matched the four.

        Translating to the intended regex,

        $_ = "abc" x 4; $re_short = qr/([0-9a-f]{4}|[0-9a-f]{8})/; $re_long = qr/([0-9a-f]{8}|[0-9a-f]{4})/; print $1, $/ if /$re_short/; print $1, $/ if /$re_long/; __END__ abca abcabcab

        After Compline,
        Zaxo