The byte stream must also be decoded properly ...
That's the point that rhesa and I were making, and which was absent in the OP code.
... otherwise perl makes assumptions about the input byte stream.
Well, if you want to put it in those terms, you could say "perl assumes that whatever byte stream comes in, that is what will be printed (unless your script specifically applies some other interpretation or conversion, either using Encode or via a PerlIO encoding layer on the output file handle).
leaving a shift-jis encoded byte stream as is, and then expecting the unicode decoding of this stream to work properly is not Ok
I'm not sure what you're talking about here. If you know you have shift-jis data, and you want to convert it to unicode, that's definitely okay, so long as you actually apply some process to do that (perl won't do it "implicitly").
(update: I just remembered something: in case you happen to be running Perl 5.8.0 on a Red-Hat 9 system, then there is a good chance that your defaults include a "locale" setting, which, on that combination of Perl/OS versions, caused Perl to make an implicit ("default") attempt to coerce input/output data between unicode and the encoding implied by the locale. This murdered countless applications and was fixed in later versions of Perl. If this is your situation, it's long past time to upgrade.)
It is clear from the code that this is understood but the wording of this post unnecessarily obfuscates the fact that perl has default settings which are not always appropriate.
Again, this is a bit hard to follow... which code are you referring to here? Which wording is obfuscating? Of course default settings are not always appropriate -- that's why there are alternatives to default settings...
I don't really know why this post turned so negative;
Me neither. That first reply (and its subthread) really threw me. If anything I said seemed negative, I apologize for that -- I generally try to keep my tone neutral, but of course I don't always succeed.
(updated to fix typos) |