http://qs321.pair.com?node_id=402651


in reply to With Unicode, \d is wrong if you mean [0-9]
in thread Regex help

No, \d will match digit characters in many languages (as TimToady mentioned). I think it's more accurate to say that it's wrong to mean [0-9], as letting people put in digits in whatever langauge they want is usually the right thing.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Replies are listed 'Best First'.
Re^2: With Unicode, \d is wrong if you mean [0-9]
by hv (Prior) on Oct 27, 2004 at 10:40 UTC

    Hmm, if you want to add one to it, it probably wants to consist of [0-9]+ rather than \d+.

    Hugo

      That could be construed as a bug in Perl's internal grok_number() routine.

        Patches welcome. :)

        If anything, I think I'd rather see things like grok_number() become hooks so you can intercept them, whether to augment, replace, or just add:

        warn "Hey, scare's over now, you can go back to 2 digits" if $num =~ /^20\d\d\z/;

        Of course we probably don't reach grok_number() unless the lexer has already spotted a digit, so we'd need hot-pluggable grammars as well.

        Hugo