Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: How to sanely handle unicode in perl?

by Sec (Monk)
on Mar 20, 2015 at 16:46 UTC ( [id://1120766]=note: print w/replies, xml ) Need Help??


in reply to Re: How to sanely handle unicode in perl?
in thread How to sanely handle unicode in perl?

This does also not solve my problem. I want perl to respect the locale of the user calling that script.

If I use your "open" statement and run the script in an iso8859-1 terminal, i get the following:

karoshi:~>LC_CTYPE=de_DE.ISO-8859-1 ./u8demo.pl I read a line, that is 1 chars long. That line is: ö That line in ascii is: o
which is clearly incorrect.

Replies are listed 'Best First'.
Re^3: How to sanely handle unicode in perl?
by Your Mother (Archbishop) on Mar 20, 2015 at 16:50 UTC

    See point 14 in Assume Brokeness of the link I gave — “Code that assumes Unicode gives a fig about POSIX locales is broken.”

      I do not assume unicode. I just want to handle data correctly. perl is apparently unable to output data in the way it's environment requires it to.

      The frustrating part is that perl looks like it is equipped to work. It is _able_ to do output conversion on the fly. It is just not able to do it correctly without user intervention.

        \xc3\xb6 is not the right byte(s) for an ö from a Latin-1 terminal, it is the UTF-8 encoding. Meaning it can only be issued by a UTF-8 encoded source (and still mean ö). So what you are asking to do sanely, strikes me as…strange. If it is coming from a Latin-1 encoding source it would be \xf6. To do encoding properly you have to know what you are receiving, decode it with that, and know what your output layer is, encode it to that. It’s not easy but it’s not magical either. Without the right steps at the right layers it’s literally guesswork and impossible to do robustly.

        I do not assume unicde.
        I think you misparsed that sentence
        “Code that assumes Unicode gives a fig about POSIX locales is broken.”
        This is not
        (Code that assumes Unicode) gives a fig about POSIX locales is broken.
        but
        Code that assumes (Unicode gives a fig about POSIX locales) is broken.
        Update: perhaps I should point out that we seem to share the same native language

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1120766]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-24 22:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found