in reply to Re: Sort undef
in thread Sort undef
That's bloody brilliant.
One question, though. To my eye it looks to be vulnerable to the case where the original list has at least one element which starts with two or more chr(255)characters and at least one element being undef.
Or am I missing something?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Sort undef
by Anonymous Monk on Jun 13, 2017 at 07:12 UTC | |
| [reply] [Watch: Dir/Any] |
by Tux (Canon) on Jun 14, 2017 at 09:26 UTC | |
Use Unicode. Perl is quite good at that
And chr(255) is *not* per definition an y with two dots. That is only the case in (encodings supported by perl (cp1252, cp1254, cp1258, hp-roman8, iso-8859-1, iso-8859-9, iso-8859-14, iso-8859-15, iso-8859-16, and UTF-7. If you don't specify the encoding or (lord forbids) *assume* any of the just listed, chr(255): 7bit-jis \xFF cp1006 ﹽ ARABIC SHADDA MEDIAL FORM cp1026 APPLICATION PROGRAM COMMAND cp1047 APPLICATION PROGRAM COMMAND cp1250 ˙ DOT ABOVE cp1251 я CYRILLIC SMALL LETTER YA cp1252 ˙ LATIN SMALL LETTER Y WITH DIAERESIS cp1254 ˙ LATIN SMALL LETTER Y WITH DIAERESIS cp1256 ے ARABIC LETTER YEH BARREE cp1257 ˙ DOT ABOVE cp1258 ˙ LATIN SMALL LETTER Y WITH DIAERESIS cp37 APPLICATION PROGRAM COMMAND cp424 APPLICATION PROGRAM COMMAND cp437 NO-BREAK SPACE cp500 APPLICATION PROGRAM COMMAND cp737 NO-BREAK SPACE cp775 NO-BREAK SPACE cp850 NO-BREAK SPACE cp852 NO-BREAK SPACE cp855 NO-BREAK SPACE cp856 NO-BREAK SPACE cp857 NO-BREAK SPACE cp858 NO-BREAK SPACE cp860 NO-BREAK SPACE cp861 NO-BREAK SPACE cp862 NO-BREAK SPACE cp863 NO-BREAK SPACE cp865 NO-BREAK SPACE cp866 NO-BREAK SPACE cp869 NO-BREAK SPACE cp875 APPLICATION PROGRAM COMMAND cp932 cp936 cp949 cp950 gsm0338 ? QUESTION MARK hp-roman8 ˙ LATIN SMALL LETTER Y WITH DIAERESIS iso-2022-jp \xFF iso-2022-jp-1 \xFF iso-2022-kr \xFF iso-8859-1 ˙ LATIN SMALL LETTER Y WITH DIAERESIS iso-8859-10 ĸ LATIN SMALL LETTER KRA iso-8859-13 ’ RIGHT SINGLE QUOTATION MARK iso-8859-14 ˙ LATIN SMALL LETTER Y WITH DIAERESIS iso-8859-15 ˙ LATIN SMALL LETTER Y WITH DIAERESIS iso-8859-16 ˙ LATIN SMALL LETTER Y WITH DIAERESIS iso-8859-2 ˙ DOT ABOVE iso-8859-3 ˙ DOT ABOVE iso-8859-4 ˙ DOT ABOVE iso-8859-5 џ CYRILLIC SMALL LETTER DZHE iso-8859-9 ˙ LATIN SMALL LETTER Y WITH DIAERESIS koi8-f Ъ CYRILLIC CAPITAL LETTER HARD SIGN koi8-r Ъ CYRILLIC CAPITAL LETTER HARD SIGN koi8-u Ъ CYRILLIC CAPITAL LETTER HARD SIGN MacArabic ے ARABIC LETTER YEH BARREE MacCentralEurRoman ˇ CARON MacChineseSimp … HORIZONTAL ELLIPSIS MacChineseTrad … HORIZONTAL ELLIPSIS MacCroatian ˇ CARON MacCyrillic € EURO SIGN MacFarsi ے ARABIC LETTER YEH BARREE MacGreek SOFT HYPHEN MacHebrew | VERTICAL LINE MacIcelandic ˇ CARON MacJapanese … MacKorean … MacRoman ˇ CARON MacRomanian ˇ CARON MacRumanian ˇ CARON MacSami ǩ LATIN SMALL LETTER K WITH CARON MacTurkish ˇ CARON posix-bc ~ TILDE UTF-7 ˙ LATIN SMALL LETTER Y WITH DIAERESIS viscii Ữ LATIN CAPITAL LETTER U WITH HORN AND TILDE Enjoy, Have FUN! H.Merijn | [reply] [Watch: Dir/Any] [d/l] |
by marinersk (Priest) on Jun 17, 2017 at 01:59 UTC | |
Touché. | [reply] [Watch: Dir/Any] |
by Anonymous Monk on Jun 17, 2017 at 04:03 UTC | |
$ perl -MDP -we'my@x=("",undef,"1","123","2","\xff","\x{00ff}");DPeek for map{$_->[1]}sort{$a->[0]cmp$b->[0]}map{[$_//"\x{1ffff}",$_]}@x'Not sure what you're trying to prove with this. Do you think U+1ffff is the biggest Unicode character? And chr(255) is *not* per definition an y with two dots.Which is exactly why I said I was *hoping* it would get replaced with something less than 255. People around here need less sensitive nerd-rage triggers. | [reply] [Watch: Dir/Any] [d/l] |
by marinersk (Priest) on Jun 14, 2017 at 08:17 UTC | |
It would seem that deaccent()would modify the data to a sub-255 value, leaving a single 255 in the Schwartian Transform as a viable sort max key -- as noted above, this should be proven before deployed. As to your other note, Unicode characters "above 255" are actually multi-byte sequences whose individual bytes still cannot exceed the architectural limitation of chr(255) so I question that perceived vulnerability. | [reply] [Watch: Dir/Any] [d/l] [select] |
by Anonymous Monk on Jun 17, 2017 at 03:50 UTC | |
Output:
| [reply] [Watch: Dir/Any] [d/l] [select] |
by marinersk (Priest) on Aug 15, 2017 at 10:48 UTC |