http://qs321.pair.com?node_id=849954

elef has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I have two related questions on the handling of UTF-8 characters.

1: How do you write a script that can take user input via a variable and write it to a UTF-8 text file? Here's what I have (the commented lines are my subsequent attempts to correct the issue, which failed).

#!/usr/bin/perl #use utf8; print "Text? "; chomp ($note = <STDIN>); print "\nText: ${note}"; #open(TEST, ">>:encoding(UTF-8)", "test.txt") or die "Can't open UTF-8 + encoded file: $!"; open(TEST, ">>", "test.txt") or die "Can't open file: $!"; print TEST "\nDirectly from the script: αινσφ&#337;ϊό&#369;\n"; print TEST "\nUser input via variable: $note\n"; close TEST; <STDIN>;

Note: I'm getting HTML character codes instead of the ő and ű letters in this post in the code... Character encoding strikes again.

As you can see, it takes user input and writes it to a file, along with some accented characters that are hardcoded into the script. The script itself is saved in UTF-8 to allow the use of all accented characters. It works fine on Ubuntu but fails on XP for me. On XP, the characters are printed correctly in the command line window by the print "\nText: ${note}"; line but they are corrupted in the file. The hardcoded stuff is fine, but if I type in the same accented letters when the script runs, they are mis-encoded.

By the way, the larger script this is a part of also reads accented characters from a UTF-8 file and writes them to another file, and that works fine on both Ubuntu and XP. So, essentially, I only have trouble with non-ascii characters if they are stored in a variable and written to a file from there. Any ideas?

2: I'm trying to get Spreadsheet::WriteExcel to work on UTF-8 files, and it's not looking very good. Here's my code for writing all lines of a file into Column A of a new spreadsheet:

#!/usr/bin/perl use warnings; use Spreadsheet::WriteExcel; # Create a new Excel workbook my $workbook = Spreadsheet::WriteExcel->new('perl.xls'); # Add a worksheet $worksheet = $workbook->add_worksheet; # write file to column A open (IN, "column1.txt"); $count = 0; while (<IN>) { $count ++; chomp ($_); $worksheet->write("A$count", $_); } close IN; <STDIN>;
I've been trying to read up on whether and how Spreadsheet::WriteExcel can handle UTF-8 characters, but I found no clear info. (Spreadsheet::WriteExcel: http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm ; its info on Unicode: http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm#UNICODE_IN_EXCEL - this seems to say what I'm trying should work - I have Perl 5.10)

This code does what I want it to on both Ubuntu and XP (the xls is created with the right content) but accented characters are corrupted in both OSes.

Thanks for any help!