http://qs321.pair.com?node_id=1193003


in reply to Re^4: Sort undef
in thread Sort undef

Perl strings are sequences of Unicode code points, not sequences of bytes. (Well, I think they're stored internally as plain bytes if possible.)
use Encode qw( encode_utf8 ); my $x = chr(1 << 63); print length($x), "\n"; print length(encode_utf8($x)), "\n"; print "yep\n" if $x gt chr(255);
Output:
Use of code point 0x8000000000000000 is deprecated; the permissible ma +x is 0x7FFFFFFFFFFFFFFF at foo line 2. 1 13 yep

Replies are listed 'Best First'.
Re^6: Sort undef
by marinersk (Priest) on Aug 15, 2017 at 10:48 UTC

    Going back to my attitude when I was a C programmer: It pays to know how your compiler thinks. (Needs adjustment for application to modern use of Perl, but the sentiment is the same.)

    Adding/replacing these three lines into my original script above:

    use Encode qw( encode_utf8 ); my $x = chr(1 << 63); my @Unsorted = ( 'Dog', 'Cat', 'Bird', undef, $x, 'Elephant', undef, ' +Lizard' );

    Yields:

    S:\Steve\Dev\PerlMonks\P-2017-06-12@0734-sort-undef>perl .\sort011.pl ---------------------------------------------------------------------- +--------- Original: ---------------------------------------------------------------------- +--------- Dog Cat Bird (undef) Wide character in print at .\sort011.pl line 58.  ÇêÇÇÇÇÇÇÇÇÇÇ Elephant (undef) Lizard ---------------------------------------------------------------------- +--------- ---------------------------------------------------------------------- +--------- Custom Sort: ---------------------------------------------------------------------- +--------- Bird Cat Dog Elephant Lizard (undef) (undef) Wide character in print at .\sort011.pl line 58.  ÇêÇÇÇÇÇÇÇÇÇÇ ---------------------------------------------------------------------- +--------- S:\Steve\Dev\PerlMonks\P-2017-06-12@0734-sort-undef>

    A string of chr(255)bytes longer than the longest item in the original array still fails to sort to the bottom; knowing that Unicode characters are stored differently than old-fashioned ASCII strings empowers the Perl programmer to make a better choice.

    Thank you for the information!

    I'd upvote the post, but there isn't any point, as it's Anonymous Monk.