Hey Thanks, that was it (latin1). Is there a universal way of determining a character set being used (i.e. in a given email, word document, text file, etc). Or do you have to poke around or just "know". Thanks again! | [reply] [Watch: Dir/Any] |
Is there a universal way of determining a character set being used
You can try Encode::Guess, but generally it's better to "just
know", as there's no easy failsafe way to determine encodings. In other words,
the module can tell apart UTF-8 from ISO-8859-1, but is having a hard time
figuring out if something is ISO-8859-1 or ISO-8859-15... (typically, you'd also need to specify potential candidates as hints, e.g. via ->set_suspects())
| [reply] [Watch: Dir/Any] [d/l] |
So, to "just know" is it a matter of being familiar with different character encoding sets and the context of the document in question (for example, a text file created with vi on my machine, an email from China, or a Word attachment from Zimbabwe)?
I took at look at this page http://en.wikipedia.org/wiki/Character_encoding most of that is fairly familiar to me. It did add a little to the confusion that latin1 or us-ascii wasn't mentioned on that page (ok, I know Wiki is not the definitive source).
I suppose like anything else, experience is valuable. So after this, I guess I know a little more.
| [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] |