Re^2: UTF-8 issues with Perl in general and with Spreadsheet::WriteExcel

Well, I only use STDIN to get user input, which is always just one line, which I store in a variable and then use it for whatever purpose later... So I don't see how a while loop would be useful. Anyway, the more I know about this stuff, the less I understand it. I tried just adding binmode STDIN, ':encoding(UTF-8)'; to the script above, now I get a different problem: error messages of this sort: utf8 "\xFB" does not map to Unicode at [script] line 8. The output file contains the character codes instead of the characters: \xFB\x{32CB8E1}\x82\xA0

Maybe I should be using encode() and decode() but I just don't know how they relate to "use utf8", and "binmode :encoding(UTF-8)". This is a huge mess and I feel like I'm having to fight a hundred dragons just to get some damned characters to display correctly. Why everything isn't in UTF-8 in the first place is beyond me, it's 2010 for God's sake!

Anyway, I ran the test from your link ( http://perlgeek.de/en/article/encodings-and-unicode ) as well. The results are not good: all 4 lines are mojibake. The dragons are clearly winning.

Comment on Re^2: UTF-8 issues with Perl in general and with Spreadsheet::WriteExcel Select or Download Code

Replies are listed 'Best First'.

Re^3: UTF-8 issues with Perl in general and with Spreadsheet::WriteExcel
by moritz (Cardinal) on Jul 16, 2010 at 16:52 UTC

So I don't see how a while loop would be useful

It was an example, with the purpose of demonstrating that you need to set the IO layer only once, and not before every reading operation. Of course you are welcome to deviate from the example.

utf8 "\xFB" does not map to Unicode at script line 8.

That means that your input is not in UTF-8. Find out which character encoding it is, and use the name in the :encoding($encoding_name) IO layer.

Maybe I should be using encode() and decode() but I just don't know how they relate to "use utf8", and "binmode :encoding(UTF-8)".

use utf8; has the same effect as adding a decode_utf8 before every string literal in your program. the :encoding(UTF-8) IO layer has the same effect as wrapping input operations in decode calls and output operations in encode calls.

The results are not good: all 4 lines are mojibake.

Then your next step should be either to find out which character encoding your terminal works with, or set it up to use UTF-8.

Perl 6 - links to (nearly) everything that is Perl 6.

[reply]
[d/l]
[select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks