Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: truncate string to byte count

by ikegami (Pope)
on Feb 28, 2019 at 20:25 UTC ( #1230686=note: print w/replies, xml ) Need Help??


in reply to Re: truncate string to byte count
in thread truncate string to byte count

This utf8cut is buggy. It can give suffers from The Unicode Bug. It's output is dependent on how a string is stored internally, which is a bug.

For example, passing a string consisting of characters 80 and 80 with a second argument of 2 will can result in "\x80" (correct) and "\x80\x80" (incorrect).

Replies are listed 'Best First'.
Re^3: truncate string to byte count
by haukex (Bishop) on Feb 28, 2019 at 20:54 UTC
    For example, passing a string consisting of characters 80 and 80 with a second argument of 2 will can [sic] result in "\x80" (correct) and "\x80\x80" (incorrect).

    The way you've worded this makes it sound like the output is not deterministic, which is certainly not the case. Also, "a string consisting of characters 80 and 80" is not specific enough for a test case. But please feel free to provide some actual test code that demonstrates the bug you are trying to explain, or better yet, show how you would've coded it to (at least in your view) "correctly" handle the different strings "\x80\x80" and "\N{U+80}\N{U+80}".

      But please feel free to provide some actual test code that demonstrates the bug

      Code that suffers from The Unicode Bug is code that returns different results for equal strings. This is easily demonstrated using the following:

      my $s = "\x80\x80"; utf8::upgrade( my $u = $s ); utf8::downgrade( my $d = $s ); is($u, $d); is(utf8cut($u,2), utf8cut($d,2));

      better yet, show how you would've coded it to (at least in your view) "correctly" handle the different strings "\x80\x80" and "\N{U+80}\N{U+80}".

      Perl considers those the same value, and any code that doesn't is by definition suffering from The Unicode Bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1230686]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (1)
As of 2021-02-27 01:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?