Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Simplest Possible Way To Disable Unicode

by BrowserUk (Patriarch)
on May 25, 2011 at 10:17 UTC ( [id://906627]=note: print w/replies, xml ) Need Help??


in reply to Re: Simplest Possible Way To Disable Unicode
in thread Simplest Possible Way To Disable Unicode

a lot of guru coders not so happy beacause some of their pack, syswrite or wha telse spells have lost the shining of primeval eras..

That simply isn't what is going on here.

The docs for pack say:

  • C   An unsigned char (octet) value.
  • W   An unsigned char value (can be greater than 255).
  • U   A Unicode character number.  Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode.

Now let's see what happens when we assign oversized values to other unsigned types:

print unpack 'S', pack 'S', 65537;; 1 print unpack 'L', pack 'L', 2**32+1;; 1 print unpack 'Q', pack 'Q', 2**64+1;; 18446744073709551615

It silently wraps (or truncates) as is expected and normal.

Contrast that with what now (since the advent of unicode support) happens with unsigned char values:

print unpack 'C', pack 'C', 2**8+1;; Character in 'C' format wrapped in pack at (eval 17) line 1, <STDIN> l +ine 9. 1

A dumb warning that can only be disabled by disabling *all* pack warnings. Don't forget the 'W' and 'U' types above.

It is perfectly reasonable to expect silent truncation of oversized values with unsigned char types ('C'). Just as was the case with 'C' before the addition of unicode support; and just as is still the case with all other unsigned types. This is not an error, nor "sloppy coding"; it is the norm for these types.

Now constrast this spurious warning with the what happens when you use chr with oversized values:

$s = chr( 257 );; print do{ use bytes; length $s, unpack 'C*', $s };; 2 196 129

Perl silently accepts this error, and erroneously constructs a multi-byte character.

And you only discover this error when you try to print it:

print $s, length $s;; Wide character in print at (eval 19) line 1, <STDIN> line 11. &#9472;ü 1

Which may not happen until dozens or hundreds of lines further on into the code; perhaps in another of your source files; perhaps in a module you didn't write or even know that you were (indirectly) using.

That is the very worst kind of error situation: action at a distance.

So, the problem is not (only) that this breaks "spells have lost the shining of primeval eras", but rather that the current, here today and tomorrow, state of play is that Perl issues spurious warnings for code that has always (and still should by the evidence of other similar current operations) be considered normal. Whilst silently not just ignoring a possible programmer error, but then making asinine assumptions and implementing the wrong thing, in a way that means such errors are horribly difficult to track down.

You cannot have it both ways. Fobbing this off with "documentation error" or "ancient sloppy coding practices" doesn't cut it.

Either *all* oversized assignments to unsigned types should silently truncate; or *all* should warn.

Either chr should be only for 8-bit bytes and attempts to set oversized values should warn in-situ or chr should accept multi-byte ordinals and print should know how to handle them.

Except the latter is impossible because Unicode is such a crock.

One solution would be to add a wchr function that accepted multi-byte ordinals. That would make it very clear that the programmer is expecting to program with MBCSs and allow chr to catch coding errors at source.

Another, in my opinion preferable, solution would be to have it so that pre-unicode support semantic were followed everywhere, unless a use Unicode; statement was seen.

Ie. Instead of having to try (and fail) to disable these changes when you don't want them with use bytes;, when you want Unicode semantics, you ask for them. Seem logical?

Unfortunately, it is too late for that.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Simplest Possible Way To Disable Unicode
by Discipulus (Canon) on May 25, 2011 at 10:41 UTC
    ..ohhh

    I choosed to speak ironically (spell, shine, ..) exactly because I had not a clear idea about what was going on..
    thanks for the explanation.

    Lor*
    there are no rules, there are no thumbs..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://906627]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-04-18 07:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found