Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Encoding Hell

by kettle (Beadle)
on Aug 10, 2006 at 02:11 UTC ( [id://566529]=note: print w/replies, xml ) Need Help??


in reply to Re: Encoding Hell
in thread Encoding Hell

"Whatever display tool you are using to view the data as it arrives (and just what are you using to view the data?), it's that tool which is applying the "conversion" (the interpretation of the octet stream) that you find so confusing."

This is not precisely true - and I never said I found it confusing... It does matter that whatever one uses to view the data be set to the same encoding that the output has been set to, but this is not the whole story. The byte stream must also be decoded properly, i.e. it must match the encoding at the source - otherwise perl makes assumptions about the input byte stream. After that one can make changes according to one's 'display tool', but leaving a shift-jis encoded byte stream as is, and then expecting the unicode decoding of this stream to work properly is not Ok. It is clear from the code that this is understood but the wording of this post unnecessarily obfuscates the fact that perl has default settings which are not always appropriate. I don't really know why this post turned so negative; but I guess it must be my fault. Anyway the problem as mentioned a ways above, is long solved, so I guess I shan't be harking back again.

Replies are listed 'Best First'.
Re^3: Encoding Hell
by graff (Chancellor) on Aug 10, 2006 at 17:44 UTC
    The byte stream must also be decoded properly ...

    That's the point that rhesa and I were making, and which was absent in the OP code.

    ... otherwise perl makes assumptions about the input byte stream.

    Well, if you want to put it in those terms, you could say "perl assumes that whatever byte stream comes in, that is what will be printed (unless your script specifically applies some other interpretation or conversion, either using Encode or via a PerlIO encoding layer on the output file handle).

    leaving a shift-jis encoded byte stream as is, and then expecting the unicode decoding of this stream to work properly is not Ok

    I'm not sure what you're talking about here. If you know you have shift-jis data, and you want to convert it to unicode, that's definitely okay, so long as you actually apply some process to do that (perl won't do it "implicitly").

    (update: I just remembered something: in case you happen to be running Perl 5.8.0 on a Red-Hat 9 system, then there is a good chance that your defaults include a "locale" setting, which, on that combination of Perl/OS versions, caused Perl to make an implicit ("default") attempt to coerce input/output data between unicode and the encoding implied by the locale. This murdered countless applications and was fixed in later versions of Perl. If this is your situation, it's long past time to upgrade.)

    It is clear from the code that this is understood but the wording of this post unnecessarily obfuscates the fact that perl has default settings which are not always appropriate.

    Again, this is a bit hard to follow... which code are you referring to here? Which wording is obfuscating? Of course default settings are not always appropriate -- that's why there are alternatives to default settings...

    I don't really know why this post turned so negative;

    Me neither. That first reply (and its subthread) really threw me. If anything I said seemed negative, I apologize for that -- I generally try to keep my tone neutral, but of course I don't always succeed.

    (updated to fix typos)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://566529]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-25 06:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found