http://qs321.pair.com?node_id=906418


in reply to Re: Simplest Possible Way To Disable Unicode
in thread Simplest Possible Way To Disable Unicode

it should be silently truncated.

no warnings qw( pack );

More to the point, I think unicode should be explicitly enabled by those that need it

You're getting an overflow warning. It has nothing to do with Unicode. In fact, pack and unpack don't use Unicode at all.*

* — Not even "U" has any understanding of Unicode.

>perl -wE"say sprintf '%X', unpack 'U', pack 'U', 0x200000" 200000

Replies are listed 'Best First'.
Re^3: Simplest Possible Way To Disable Unicode
by BrowserUk (Patriarch) on May 24, 2011 at 05:59 UTC
    no warnings qw( pack );

    So, you'd have us throw away all the useful warnings that pack can emit when I do something wrong in order to disable the stupid warning emitted when it does something wrong. Cool-io. Not.

    You're getting an overflow warning.

    Oh sure. "Wide character" says 'overflow', like super-injunction says right to privacy for all.

    It has nothing to do with Unicode.

    Really? Can you guess where this direct quote " A Unicode character number." comes from?

    I don't give flying fig whether you want to conflate the term 'unicode' with that multiplicitous cock-up of formats that hide behind the moniker 'The Unicode Standard'(*), and can't see that I used the former as a short-hand for 'multi-byte character sets'.

    Which should of course be 'The Multicode Standards:Everything including the (7 different) kitchen sinks'

    * — Not even "U" has any understanding of Unicode. >perl -wE"say sprintf '%X', unpack 'U', pack 'U', 0x200000" 200000

    Wadday'know. If you pack with U and unpack with U you get back what you packed. D'uh. A pointless example of nothing much.

    This is the problem.

    perl -wE"$s=pack 'U*', 257; say length $s; print for unpack 'C*', $s;" 1 257

    That totally devalues the purpose of having two different template characters.

    • one for C   An unsigned char (octet) value.
    • one for U   A Unicode character number.  Encodes to a character in character mode and UTF-8 ... in byte mode.

    That should not happen. And I shouldn't have to state that I don't want it to happen:

    >perl -Mbytes -wE"$s=pack 'U*', 257; say length $s; say for unpack 'C* +', $s;" 2 196 129

    It breaks backward compatibility in the very worst way.

    • Screaming when you are doing nothing wrong.

      Breaking both existing, working code and existing expectations. And causing people to disable important and useful warnings to silence it.

    • And saying nothing at all when it does it wrong thing.

      Just silently breaking previously working, 'best practice' code violating every expectation and rule of change and enhancement.

    The Unicode Standard is a cock-up. And the Perl implementation worse.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      So, you'd have us throw away all the useful warnings

      I'm not sure what other warnings pack 'C' or pack in general can emit. You could submit a patch so that pack overflow warnings are a subclass of pack warnings.

      Oh sure. "Wide character" says 'overflow', like super-injunction says right to privacy for all.

      It doesn't say "Wide character".

      >perl -we"$_ = pack 'C*', 257" Character in 'C' format wrapped in pack at -e line 1.

      It's saying how it handled an overflow.

      Really? Can you guess where this direct quote " A Unicode character number." comes from?

      That's easy, but moot. I've already pointed out the documentation is wrong. There is no such thing as Unicode number 0x20000, yet

      >perl -wE"say sprintf '%X', unpack 'U', pack 'U', 0x200000" 200000

      The docs sometimes assign Unicode semantics to operations where no such semantics exist. "A Unicode character number." should simply be "A character number." In Perl, a character is a number in 0 to UVMAX.

        It doesn't say "Wide character".

        Specific error message aside, Perl should never treat a number as a 'wide character' without explicit notification from the programmer that that is his intent.

        c:\test>perl -we"print chr( 257 )" | wc -c Wide character in print at -e line 1. 2
        I've already pointed out the documentation is wrong.

        No! You didn't. Nowhere prior to this post anywhere in this thread.

        There is no such thing as Unicode number 0x20000, yet

        So, the documentation is wrong! And the implementation is (silently) wrong!

        That pretty much covers everything. Unicode support in perl is broken.

        In Perl, a character is a number in 0 to UVMAX.

        And that bullshit is exactly why it is so broken.

        Because &^*&% like you will keep on conflating 'numbers' with 'characters'.

        1. UVMAX is cpu dependant.

          Typically 4294967296 or 18446744073709551616, but with other values possible.

        2. The term 'character' has no meaning outside of some mapping.

          Unless a number can be mapped to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language., it is just a number.

          And even when it can be so mapped, until it is mapped, it is still just a number.

          And any suggestion otherwise is just so much bullshit.

        3. And 4294967296, much less 18446744073709551616 cannot be mapped to 'a character' in any known or proposed mapping.

          Which makes this:

          In Perl [or any language], a character is a number in 0 to UVMAX.
          stand out as the total twaddle it is.

        Unicode support in Perl is broken. And until people like you stop pretending that it isn't it will stay that way.

        Indeed, until those that do, stop trying to pretend that you can transparently handle the abortion that is Unicode, whether retro-fitting an existing language or implementing a new one, the longer it will be before we can evolve some sane semantics for handling MBCSs.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.