Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Reading CSV Files Containing UTF8 Characters

by graff (Chancellor)
on Nov 09, 2007 at 04:31 UTC ( [id://649858]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Reading CSV Files Containing UTF8 Characters
in thread Reading CSV Files Containing UTF8 Characters

Hmm... I thought Outlook was for something like email, so I wonder about the circumstance where it is used to "export" a csv file. If someone emailed you a csv file as an attachment, you would have to hope that the sender can enlighten you as to the character encoding they used. If you can't get that from them, you would have to use Encode::Guess with more possibilities besides cp1252 and "latin1". (Alas, guessing is relatively unreliable when it comes to picking the "right" encoding among the various single-byte-latin alternatives.)

Or you'll have to inspect the data file yourself to see if you can deduce what the encoding is. Any decent hex-dump tool would suffice (to see what the byte values are for the non-ascii characters), along with knowledge of the language being used in the text, and some reference info from http://www.unicode.org/Public/MAPPINGS/ (it's an ftp-able directory of mapping tables that relate all the various non-unicode character sets to unicode).

My inclination would be: download those unicode mapping tables into a single directory, look at a hex-dump of your csv file to see which non-ascii byte values to look up, figure out what letter each byte value represents, and grep over the mapping tables to find the line that relates that byte value to that letter.

The name of the mapping table containing that line represents the character encoding you need to use when opening the csv file.

  • Comment on Re^3: Reading CSV Files Containing UTF8 Characters

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://649858]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-25 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found