Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^4: Sort undef

by marinersk (Priest)
on Jun 14, 2017 at 08:17 UTC ( #1192772=note: print w/replies, xml ) Need Help??


in reply to Re^3: Sort undef
in thread Sort undef

It would seem that deaccent()would modify the data to a sub-255 value, leaving a single 255 in the Schwartian Transform as a viable sort max key -- as noted above, this should be proven before deployed.

As to your other note, Unicode characters "above 255" are actually multi-byte sequences whose individual bytes still cannot exceed the architectural limitation of chr(255) so I question that perceived vulnerability.

Replies are listed 'Best First'.
Re^5: Sort undef
by Anonymous Monk on Jun 17, 2017 at 03:50 UTC
    Perl strings are sequences of Unicode code points, not sequences of bytes. (Well, I think they're stored internally as plain bytes if possible.)
    use Encode qw( encode_utf8 ); my $x = chr(1 << 63); print length($x), "\n"; print length(encode_utf8($x)), "\n"; print "yep\n" if $x gt chr(255);
    Output:
    Use of code point 0x8000000000000000 is deprecated; the permissible ma +x is 0x7FFFFFFFFFFFFFFF at foo line 2. 1 13 yep

      Going back to my attitude when I was a C programmer: It pays to know how your compiler thinks. (Needs adjustment for application to modern use of Perl, but the sentiment is the same.)

      Adding/replacing these three lines into my original script above:

      use Encode qw( encode_utf8 ); my $x = chr(1 << 63); my @Unsorted = ( 'Dog', 'Cat', 'Bird', undef, $x, 'Elephant', undef, ' +Lizard' );

      Yields:

      S:\Steve\Dev\PerlMonks\P-2017-06-12@0734-sort-undef>perl .\sort011.pl ---------------------------------------------------------------------- +--------- Original: ---------------------------------------------------------------------- +--------- Dog Cat Bird (undef) Wide character in print at .\sort011.pl line 58. Elephant (undef) Lizard ---------------------------------------------------------------------- +--------- ---------------------------------------------------------------------- +--------- Custom Sort: ---------------------------------------------------------------------- +--------- Bird Cat Dog Elephant Lizard (undef) (undef) Wide character in print at .\sort011.pl line 58. ---------------------------------------------------------------------- +--------- S:\Steve\Dev\PerlMonks\P-2017-06-12@0734-sort-undef>

      A string of chr(255)bytes longer than the longest item in the original array still fails to sort to the bottom; knowing that Unicode characters are stored differently than old-fashioned ASCII strings empowers the Perl programmer to make a better choice.

      Thank you for the information!

      I'd upvote the post, but there isn't any point, as it's Anonymous Monk.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1192772]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2022-05-21 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (76 votes). Check out past polls.

    Notices?