Re^2: Regex help

in reply to Re: Regex help
in thread Regex help

That would match B4๖:

    m/B(?:3[89]|4\d)/ and print for "B4\x{E56}"
[download]

With the arrival of Unicode, it's wrong to use \d if you mean [0-9].

Comment on Re^2: Regex help Select or Download Code

Replies are listed 'Best First'.
With Unicode, \d is wrong if you mean [0-9] by Happy-the-monk (Canon) on Oct 26, 2004 at 15:04 UTC
With the arrival of Unicode, it's wrong to use \d if you mean [0-9]. This worries me. What happened when I wasn't paying attention? Will most of my code that's processing text containing digits break as soon as the input contains unicode? Cheers, Sören	[reply]
Re: With Unicode, \d is wrong if you mean [0-9] by TimToady (Parson) on Oct 26, 2004 at 15:40 UTC
That depends on how desperately you want to prevent people from typing in numerals in: Arabic Devanagari Bengali Gurmukhi Gujariti Oriya Tamil Teluga Kannada Malayalam Thai Lao Tibetan Myanmar Ethiopic Khmer Mongolian Limbu Chinese Japanese Korean Vietnamese But it's not like `\d` is going to start throwing exceptions merely because you feed it Unicode.	[reply] [d/l]
Re: With Unicode, \d is wrong if you mean [0-9] by olivierp (Hermit) on Oct 26, 2004 at 15:44 UTC
I'd say it depends on what you mean with "breaks". Your code will still match [0-9] as you are used to, but will also match other characters defined as "digits" in other "scripts". If you depend on a "Latin" digit elsewhere in the code, I think you may have undesired side effects. -- Olivier	[reply]
Re: With Unicode, \d is wrong if you mean [0-9] by hardburn (Abbot) on Oct 26, 2004 at 15:46 UTC
No, \d will match digit characters in many languages (as TimToady mentioned). I think it's more accurate to say that it's wrong to mean `[0-9]`, as letting people put in digits in whatever langauge they want is usually the right thing. "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.	[reply] [d/l]
Re^2: With Unicode, \d is wrong if you mean [0-9] by hv (Prior) on Oct 27, 2004 at 10:40 UTC
Hmm, if you want to add one to it, it probably wants to consist of `[0-9]+` rather than `\d+`. Hugo	[reply] [d/l] [select]
Re^3: With Unicode, \d is wrong if you mean [0-9] by TimToady (Parson) on Oct 27, 2004 at 17:13 UTC
Re^4: With Unicode, \d is wrong if you mean [0-9] by hv (Prior) on Oct 28, 2004 at 02:45 UTC

In Section Seekers of Perl Wisdom