Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: Simplest Possible Way To Disable Unicode

by BrowserUk (Pope)
on May 24, 2011 at 05:59 UTC ( #906423=note: print w/replies, xml ) Need Help??


in reply to Re^2: Simplest Possible Way To Disable Unicode
in thread Simplest Possible Way To Disable Unicode

no warnings qw( pack );

So, you'd have us throw away all the useful warnings that pack can emit when I do something wrong in order to disable the stupid warning emitted when it does something wrong. Cool-io. Not.

You're getting an overflow warning.

Oh sure. "Wide character" says 'overflow', like super-injunction says right to privacy for all.

It has nothing to do with Unicode.

Really? Can you guess where this direct quote " A Unicode character number." comes from?

I don't give flying fig whether you want to conflate the term 'unicode' with that multiplicitous cock-up of formats that hide behind the moniker 'The Unicode Standard'(*), and can't see that I used the former as a short-hand for 'multi-byte character sets'.

Which should of course be 'The Multicode Standards:Everything including the (7 different) kitchen sinks'

* — Not even "U" has any understanding of Unicode. >perl -wE"say sprintf '%X', unpack 'U', pack 'U', 0x200000" 200000

Wadday'know. If you pack with U and unpack with U you get back what you packed. D'uh. A pointless example of nothing much.

This is the problem.

perl -wE"$s=pack 'U*', 257; say length $s; print for unpack 'C*', $s;" 1 257

That totally devalues the purpose of having two different template characters.

  • one for C   An unsigned char (octet) value.
  • one for U   A Unicode character number.  Encodes to a character in character mode and UTF-8 ... in byte mode.

That should not happen. And I shouldn't have to state that I don't want it to happen:

>perl -Mbytes -wE"$s=pack 'U*', 257; say length $s; say for unpack 'C* +', $s;" 2 196 129

It breaks backward compatibility in the very worst way.

  • Screaming when you are doing nothing wrong.

    Breaking both existing, working code and existing expectations. And causing people to disable important and useful warnings to silence it.

  • And saying nothing at all when it does it wrong thing.

    Just silently breaking previously working, 'best practice' code violating every expectation and rule of change and enhancement.

The Unicode Standard is a cock-up. And the Perl implementation worse.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Simplest Possible Way To Disable Unicode
by ikegami (Pope) on May 24, 2011 at 06:20 UTC

    So, you'd have us throw away all the useful warnings

    I'm not sure what other warnings pack 'C' or pack in general can emit. You could submit a patch so that pack overflow warnings are a subclass of pack warnings.

    Oh sure. "Wide character" says 'overflow', like super-injunction says right to privacy for all.

    It doesn't say "Wide character".

    >perl -we"$_ = pack 'C*', 257" Character in 'C' format wrapped in pack at -e line 1.

    It's saying how it handled an overflow.

    Really? Can you guess where this direct quote " A Unicode character number." comes from?

    That's easy, but moot. I've already pointed out the documentation is wrong. There is no such thing as Unicode number 0x20000, yet

    >perl -wE"say sprintf '%X', unpack 'U', pack 'U', 0x200000" 200000

    The docs sometimes assign Unicode semantics to operations where no such semantics exist. "A Unicode character number." should simply be "A character number." In Perl, a character is a number in 0 to UVMAX.

      It doesn't say "Wide character".

      Specific error message aside, Perl should never treat a number as a 'wide character' without explicit notification from the programmer that that is his intent.

      c:\test>perl -we"print chr( 257 )" | wc -c Wide character in print at -e line 1. 2
      I've already pointed out the documentation is wrong.

      No! You didn't. Nowhere prior to this post anywhere in this thread.

      There is no such thing as Unicode number 0x20000, yet

      So, the documentation is wrong! And the implementation is (silently) wrong!

      That pretty much covers everything. Unicode support in perl is broken.

      In Perl, a character is a number in 0 to UVMAX.

      And that bullshit is exactly why it is so broken.

      Because &^*&% like you will keep on conflating 'numbers' with 'characters'.

      1. UVMAX is cpu dependant.

        Typically 4294967296 or 18446744073709551616, but with other values possible.

      2. The term 'character' has no meaning outside of some mapping.

        Unless a number can be mapped to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language., it is just a number.

        And even when it can be so mapped, until it is mapped, it is still just a number.

        And any suggestion otherwise is just so much bullshit.

      3. And 4294967296, much less 18446744073709551616 cannot be mapped to 'a character' in any known or proposed mapping.

        Which makes this:

        In Perl [or any language], a character is a number in 0 to UVMAX.
        stand out as the total twaddle it is.

      Unicode support in Perl is broken. And until people like you stop pretending that it isn't it will stay that way.

      Indeed, until those that do, stop trying to pretend that you can transparently handle the abortion that is Unicode, whether retro-fitting an existing language or implementing a new one, the longer it will be before we can evolve some sane semantics for handling MBCSs.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Perl should never treat a number as a 'wide character' without explicit notification from the programmer that that is his intent.

        Judging by your example, I think you mean you don't want wide character to automatically get encoded to UTF-8. (Correct me if I'm wrong.)

        What do you propose instead? I can think of a couple.

        • Dying like syswrite? I'm not sure that's better, but I could easily be convinced.

        • Silently convert the numbers to UTF-8? I definitely want at least a warning if non-bytes is passed to print when warnings are on. I don't care what output it produces. Currently, it also warns when warnings are off. That's not appropriate, but I think that's suppose to change.

        • Silently truncate the high bits? Same reply as previous.

        The term 'character' has no meaning outside of some mapping.

        Characters have no meaning outside a mapping, but the term does. It's simply the basic unit of a string.

        And even when it can be so mapped, until it is mapped, it is still just a number.

        I fully agree. That's why I said pack doesn't deal with Unicode. It just deals with numbers. So do chr, ord, substr, index, etc.

        Operators that do use mappings are lc, \d in regex patterns, etc.

        And 4294967296, much less 18446744073709551616 cannot be mapped to 'a character' in any known or proposed mapping.

        No, but 4294967295 is a valid character.

        >perl -E"say ord chr 4294967295" 4294967295

        Perl uses utf8 (not to be confused with UTF-8), an encoding whose charset consist of 2**72 characters. Only up to UVMAX is supported, though.

        Unicode support in Perl is broken.

        I'm not going to discuss this because this thread has nothing to do with Unicode.

        The OP tried to send non-bytes to a file handle, and you tried to store something larger than a byte in a byte. A warning and dying aren't unwarranted.

        Unicode support in perl is broken.
        That isn’t even vaguely true, let alone concretely true. Just because one person does not understand something, or because another person doesn’t like something, does in no fashion mean that that something is somehow “broken”. To claim otherwise is tantamount to spreading leyendas negras and perilously close to spreading FUD. We need neither of those.

        Having fought my way through the many, many ways that Unicode does not work properly in various other languages like Java, C#, Python, Ruby, PHP, and Javascript, not to mention the original misguided implementation of Unicode support from Perl 5.6 that’s been thankfully redesigned since then, I am completely confident that Perl’s Unicode support is not only not broken, but also that the Unicode support in Perl is superior to that in all those languages I’ve just mentioned.

        Now, it is actually true that Unicode support has improved in the 5.14 release of Perl. However, Unicode support in Perl has been perfectly serviceable for many years now. To pretend that it is “broken” may be misunderstanding, it may be disagreement, and it may be bitter bluster, but it is simply and fundamentally not true.

        It is also misleading and harmful to hear repeated. It helps nothing and only hurts people, people who may be naïvely deceived by this facile deceit. Here is what you should do instead:

        • If you think it should work differently, then submit a patch.
        • If you think there is a bug, then file a bug report.
        • If you are unwilling to take either of those two constructive steps, then please do the world the courtesy of not repeating a simple-minded slogan that is so patently false, misleading, and hurtful.

        Those are the only reasonable choices. If none of those “appeals” to you, then please gain some proper perspective by seriously trying out those other languages’ implementations of Unicode support. Who knows, you might even like them better than you do Perl’s.

        If it irks you to paddle upstream all the time, then turn around and go the other way. Save yourself some grief — and the rest of us, too.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://906423]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2021-04-21 13:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?