http://qs321.pair.com?node_id=1192777


in reply to Re^3: Sort undef
in thread Sort undef

Use Unicode. Perl is quite good at that

$ perl -MDP -we'my@x=("",undef,"1","123","2","\xff","\x{00ff}");DPeek +for map{$_->[1]}sort{$a->[0]cmp$b->[0]}map{[$_//"\x{1ffff}",$_]}@x' PV(""\0) PV("1"\0) PV("123"\0) PV("2"\0) PV("\377"\0) PV("\377"\0) UNDEF

And chr(255) is *not* per definition an y with two dots. That is only the case in (encodings supported by perl (cp1252, cp1254, cp1258, hp-roman8, iso-8859-1, iso-8859-9, iso-8859-14, iso-8859-15, iso-8859-16, and UTF-7. If you don't specify the encoding or (lord forbids) *assume* any of the just listed, chr(255):

  7bit-jis                       \xFF
  cp1006                         ﹽ      ARABIC SHADDA MEDIAL FORM
  cp1026                                APPLICATION PROGRAM COMMAND
  cp1047                                APPLICATION PROGRAM COMMAND
  cp1250                         ˙      DOT ABOVE
  cp1251                         я      CYRILLIC SMALL LETTER YA
  cp1252                         ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  cp1254                         ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  cp1256                         ے      ARABIC LETTER YEH BARREE
  cp1257                         ˙      DOT ABOVE
  cp1258                         ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  cp37                                  APPLICATION PROGRAM COMMAND
  cp424                                 APPLICATION PROGRAM COMMAND
  cp437                                 NO-BREAK SPACE
  cp500                                 APPLICATION PROGRAM COMMAND
  cp737                                 NO-BREAK SPACE
  cp775                                 NO-BREAK SPACE
  cp850                                 NO-BREAK SPACE
  cp852                                 NO-BREAK SPACE
  cp855                                 NO-BREAK SPACE
  cp856                                 NO-BREAK SPACE
  cp857                                 NO-BREAK SPACE
  cp858                                 NO-BREAK SPACE
  cp860                                 NO-BREAK SPACE
  cp861                                 NO-BREAK SPACE
  cp862                                 NO-BREAK SPACE
  cp863                                 NO-BREAK SPACE
  cp865                                 NO-BREAK SPACE
  cp866                                 NO-BREAK SPACE
  cp869                                 NO-BREAK SPACE
  cp875                                 APPLICATION PROGRAM COMMAND
  cp932                          
  cp936                          
  cp949                          
  cp950                          
  gsm0338                        ?      QUESTION MARK
  hp-roman8                      ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  iso-2022-jp                    \xFF
  iso-2022-jp-1                  \xFF
  iso-2022-kr                    \xFF
  iso-8859-1                     ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  iso-8859-10                    ĸ      LATIN SMALL LETTER KRA
  iso-8859-13                    ’      RIGHT SINGLE QUOTATION MARK
  iso-8859-14                    ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  iso-8859-15                    ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  iso-8859-16                    ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  iso-8859-2                     ˙      DOT ABOVE
  iso-8859-3                     ˙      DOT ABOVE
  iso-8859-4                     ˙      DOT ABOVE
  iso-8859-5                     џ      CYRILLIC SMALL LETTER DZHE
  iso-8859-9                     ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  koi8-f                         Ъ      CYRILLIC CAPITAL LETTER HARD SIGN
  koi8-r                         Ъ      CYRILLIC CAPITAL LETTER HARD SIGN
  koi8-u                         Ъ      CYRILLIC CAPITAL LETTER HARD SIGN
  MacArabic                      ے      ARABIC LETTER YEH BARREE
  MacCentralEurRoman             ˇ      CARON
  MacChineseSimp                 …      HORIZONTAL ELLIPSIS
  MacChineseTrad                 …      HORIZONTAL ELLIPSIS
  MacCroatian                    ˇ      CARON
  MacCyrillic                    €      EURO SIGN
  MacFarsi                       ے      ARABIC LETTER YEH BARREE
  MacGreek                              SOFT HYPHEN
  MacHebrew                      |      VERTICAL LINE
  MacIcelandic                   ˇ      CARON
  MacJapanese                    …
  MacKorean                      …
  MacRoman                       ˇ      CARON
  MacRomanian                    ˇ      CARON
  MacRumanian                    ˇ      CARON
  MacSami                        ǩ      LATIN SMALL LETTER K WITH CARON
  MacTurkish                     ˇ      CARON
  posix-bc                       ~      TILDE
  UTF-7                          ˙      LATIN SMALL LETTER Y WITH DIAERESIS
  viscii                         Ữ      LATIN CAPITAL LETTER U WITH HORN AND TILDE

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^5: Sort undef
by marinersk (Priest) on Jun 17, 2017 at 01:59 UTC

    Touché.

Re^5: Sort undef
by Anonymous Monk on Jun 17, 2017 at 04:03 UTC
    $ perl -MDP -we'my@x=("",undef,"1","123","2","\xff","\x{00ff}");DPeek for map{$_->[1]}sort{$a->[0]cmp$b->[0]}map{[$_//"\x{1ffff}",$_]}@x'
    Not sure what you're trying to prove with this. Do you think U+1ffff is the biggest Unicode character?
    And chr(255) is *not* per definition an y with two dots.
    Which is exactly why I said I was *hoping* it would get replaced with something less than 255. People around here need less sensitive nerd-rage triggers.