Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: parsing non english

by arcnon (Monk)
on Nov 08, 2007 at 17:30 UTC ( [id://649767]=note: print w/replies, xml ) Need Help??


in reply to Re: parsing non english
in thread parsing non english

It comes from a access database. Some Japanese fellows translated some information for a doctor but they placed all the translated names in 1 field... It has been placed upon me to break it up and insert it into a new database.
Being I am a lazy american I can barely speak english. I didnt load any foriegn charsets so I assume I am not seeing a true representation.
Honestly is this info unicode I dont have the slighest idea.
just guessing the comma character based what I was told it was... then viewing that character in a hex editor.

Replies are listed 'Best First'.
Re^3: parsing non english
by moritz (Cardinal) on Nov 08, 2007 at 17:46 UTC
    Well, first you have to find out the encoding. Otherwise the data is just binary garbage to your and your programs.

    I'd suggest to ask the ones that produced the data.

    There are a few other possiblities, for example the text editor vim has a decent charset autodetection.

    You can also try Encode::Guess, but you have to provide it with a list of possible encodings. Try to find out which encodings are used in japan on windows.

    Once you know the charset, you can decode with (with decode from the module Encode) and work with it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://649767]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-23 06:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found