Keep It Simple, Stupid | |
PerlMonks |
Re^2: Using "negative" characters with the range operator. [Unicode::Collate]by kcott (Archbishop) |
on Mar 13, 2017 at 07:44 UTC ( [id://1184367]=note: print w/replies, xml ) | Need Help?? |
G'day vrk, "Besides, Unicode codepoints often aren't ordered alphabetically in any script, so you wouldn't get a sorted (collated) sequence even if it did." [Note: There's no intended pedantry here; however, as I understand your statement, I believe you mean "characters", not "codepoints". On that basis, I don't disagree with your statement, at all. The distinction is important for the remainder of my response.] The builtin module Unicode::Collate can be used for sorting Unicode characters.
The code points are numerical values: a numerical sort is required for these.
Code points are often presented as hexidecimal strings (that may have a leading "U+"). When dealing with these, it can be useful to first convert them to some canonical format. As the code point range is 0 .. 0x10ffff, an sprintf format including "%06x" or "%06X" handles all cases.
— Ken
In Section
Seekers of Perl Wisdom
|
|