Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Using "negative" characters with the range operator. [Unicode::Collate]

by kcott (Archbishop)
on Mar 13, 2017 at 07:44 UTC ( [id://1184367]=note: print w/replies, xml ) Need Help??


in reply to Re: Using "negative" characters with the range operator.
in thread Using "negative" characters with the range operator.

G'day vrk,

"Besides, Unicode codepoints often aren't ordered alphabetically in any script, so you wouldn't get a sorted (collated) sequence even if it did."

[Note: There's no intended pedantry here; however, as I understand your statement, I believe you mean "characters", not "codepoints". On that basis, I don't disagree with your statement, at all. The distinction is important for the remainder of my response.]

The builtin module Unicode::Collate can be used for sorting Unicode characters.

$ perl -E 'say for sort qw{z é a}' a z é $ perl -MUnicode::Collate -E 'say for Unicode::Collate::->new->sort(qw +{z é a})' a é z

The code points are numerical values: a numerical sort is required for these.

$ perl -E 'say for sort map { ord } qw{z é a}' 122 195 97 $ perl -E 'say for sort { $a <=> $b } map { ord } qw{z é a}' 97 122 195

Code points are often presented as hexidecimal strings (that may have a leading "U+"). When dealing with these, it can be useful to first convert them to some canonical format. As the code point range is 0 .. 0x10ffff, an sprintf format including "%06x" or "%06X" handles all cases.

$ perl -E 'say sprintf "U+%06X", $_ for map { ord } qw{z é a}' U+00007A U+0000C3 U+000061

— Ken

Replies are listed 'Best First'.
Re^3: Using "negative" characters with the range operator. [Unicode::Collate]
by vrk (Chaplain) on Mar 13, 2017 at 09:49 UTC

    Yes indeed! Thanks for the clarification.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1184367]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-23 20:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found