Re^2: use locale broken?


Welcome to the Monastery
	PerlMonks

Re^2: use locale broken?

by december (Pilgrim)

on Mar 17, 2011 at 18:01 UTC ( [id://893824]=note: print w/replies, xml )

Need Help??

in reply to Re: use locale broken?
in thread use locale broken?

I was hoping to have it work both when the user (shell) encoding is in either ISO-8859-1 or UTF-8. Maybe I'm better off forcefully converting all input and output to UTF-8 and have the code itself dealing with UNICODE only.

I still feel this is a bug in Perl, though.

Is there a way – perhaps debugging argument – to see what \w applies to?

Comment on Re^2: use locale broken? Download Code

Replies are listed 'Best First'.

Re^3: use locale broken? (\w)
by ikegami (Patriarch) on Mar 17, 2011 at 19:12 UTC

Maybe I'm better off forcefully converting all input and output to UTF-8

Yes. For many reasons, it is best to decode all inputs, and encode all output.

I still feel this is a bug in Perl, though.

I believe Perl doesn't support multi-byte locales (e.g. UTF-8).

Effort is placed on Unicode instead instead of adding to the locale system.

Is there a way – perhaps debugging argument – to see what \w applies to?

perlre: Match a "word" character (alphanumeric plus "_").

The following are equivalent:

( No, this is wrong )

/\w/                   # When no locale, when not restricted to ASCII
/\p{Word}/
/[_\p{Alnum}]/
/[_\p{Alphabetic}\p{Nd}]/
[download]

Derived property "Alphabetic". (100,520 codepoints in Perl 5.12.2)
Unicode character category "Nd". (411 codepoints in Perl 5.12.2)

Actual lists vary by version of Unicode and thus by version of Perl.

[reply]
[d/l]
[select]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://893824]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others admiring the Monastery: (5)

As of 2024-04-24 11:28 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found