Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
a lot of guru coders not so happy beacause some of their pack, syswrite or wha telse spells have lost the shining of primeval eras..

That simply isn't what is going on here.

The docs for pack say:

  • C   An unsigned char (octet) value.
  • W   An unsigned char value (can be greater than 255).
  • U   A Unicode character number.  Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode.

Now let's see what happens when we assign oversized values to other unsigned types:

print unpack 'S', pack 'S', 65537;; 1 print unpack 'L', pack 'L', 2**32+1;; 1 print unpack 'Q', pack 'Q', 2**64+1;; 18446744073709551615

It silently wraps (or truncates) as is expected and normal.

Contrast that with what now (since the advent of unicode support) happens with unsigned char values:

print unpack 'C', pack 'C', 2**8+1;; Character in 'C' format wrapped in pack at (eval 17) line 1, <STDIN> l +ine 9. 1

A dumb warning that can only be disabled by disabling *all* pack warnings. Don't forget the 'W' and 'U' types above.

It is perfectly reasonable to expect silent truncation of oversized values with unsigned char types ('C'). Just as was the case with 'C' before the addition of unicode support; and just as is still the case with all other unsigned types. This is not an error, nor "sloppy coding"; it is the norm for these types.

Now constrast this spurious warning with the what happens when you use chr with oversized values:

$s = chr( 257 );; print do{ use bytes; length $s, unpack 'C*', $s };; 2 196 129

Perl silently accepts this error, and erroneously constructs a multi-byte character.

And you only discover this error when you try to print it:

print $s, length $s;; Wide character in print at (eval 19) line 1, <STDIN> line 11. &#9472; 1

Which may not happen until dozens or hundreds of lines further on into the code; perhaps in another of your source files; perhaps in a module you didn't write or even know that you were (indirectly) using.

That is the very worst kind of error situation: action at a distance.

So, the problem is not (only) that this breaks "spells have lost the shining of primeval eras", but rather that the current, here today and tomorrow, state of play is that Perl issues spurious warnings for code that has always (and still should by the evidence of other similar current operations) be considered normal. Whilst silently not just ignoring a possible programmer error, but then making asinine assumptions and implementing the wrong thing, in a way that means such errors are horribly difficult to track down.

You cannot have it both ways. Fobbing this off with "documentation error" or "ancient sloppy coding practices" doesn't cut it.

Either *all* oversized assignments to unsigned types should silently truncate; or *all* should warn.

Either chr should be only for 8-bit bytes and attempts to set oversized values should warn in-situ or chr should accept multi-byte ordinals and print should know how to handle them.

Except the latter is impossible because Unicode is such a crock.

One solution would be to add a wchr function that accepted multi-byte ordinals. That would make it very clear that the programmer is expecting to program with MBCSs and allow chr to catch coding errors at source.

Another, in my opinion preferable, solution would be to have it so that pre-unicode support semantic were followed everywhere, unless a use Unicode; statement was seen.

Ie. Instead of having to try (and fail) to disable these changes when you don't want them with use bytes;, when you want Unicode semantics, you ask for them. Seem logical?

Unfortunately, it is too late for that.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: Simplest Possible Way To Disable Unicode by BrowserUk
in thread Simplest Possible Way To Disable Unicode by JapanIsShinto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (6)
    As of 2021-04-21 15:16 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?