Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hi everyone, I have two related questions on the handling of UTF-8 characters.

1: How do you write a script that can take user input via a variable and write it to a UTF-8 text file? Here's what I have (the commented lines are my subsequent attempts to correct the issue, which failed).

#!/usr/bin/perl #use utf8; print "Text? "; chomp ($note = <STDIN>); print "\nText: ${note}"; #open(TEST, ">>:encoding(UTF-8)", "test.txt") or die "Can't open UTF-8 + encoded file: $!"; open(TEST, ">>", "test.txt") or die "Can't open file: $!"; print TEST "\nDirectly from the script: αινσφ&#337;ϊό&#369;\n"; print TEST "\nUser input via variable: $note\n"; close TEST; <STDIN>;

Note: I'm getting HTML character codes instead of the ő and ű letters in this post in the code... Character encoding strikes again.

As you can see, it takes user input and writes it to a file, along with some accented characters that are hardcoded into the script. The script itself is saved in UTF-8 to allow the use of all accented characters. It works fine on Ubuntu but fails on XP for me. On XP, the characters are printed correctly in the command line window by the print "\nText: ${note}"; line but they are corrupted in the file. The hardcoded stuff is fine, but if I type in the same accented letters when the script runs, they are mis-encoded.

By the way, the larger script this is a part of also reads accented characters from a UTF-8 file and writes them to another file, and that works fine on both Ubuntu and XP. So, essentially, I only have trouble with non-ascii characters if they are stored in a variable and written to a file from there. Any ideas?

2: I'm trying to get Spreadsheet::WriteExcel to work on UTF-8 files, and it's not looking very good. Here's my code for writing all lines of a file into Column A of a new spreadsheet:

#!/usr/bin/perl use warnings; use Spreadsheet::WriteExcel; # Create a new Excel workbook my $workbook = Spreadsheet::WriteExcel->new('perl.xls'); # Add a worksheet $worksheet = $workbook->add_worksheet; # write file to column A open (IN, "column1.txt"); $count = 0; while (<IN>) { $count ++; chomp ($_); $worksheet->write("A$count", $_); } close IN; <STDIN>;
I've been trying to read up on whether and how Spreadsheet::WriteExcel can handle UTF-8 characters, but I found no clear info. (Spreadsheet::WriteExcel: http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm ; its info on Unicode: http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm#UNICODE_IN_EXCEL - this seems to say what I'm trying should work - I have Perl 5.10)

This code does what I want it to on both Ubuntu and XP (the xls is created with the right content) but accented characters are corrupted in both OSes.

Thanks for any help!


In reply to UTF-8 issues with Perl in general and with Spreadsheet::WriteExcel by elef

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-19 14:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found