Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Mail::Sender character set/encoding

by Rodster001 (Pilgrim)
on Mar 12, 2009 at 20:26 UTC ( [id://750255]=note: print w/replies, xml ) Need Help??


in reply to Re: Mail::Sender character set/encoding
in thread Mail::Sender character set/encoding

Hey Thanks, that was it (latin1). Is there a universal way of determining a character set being used (i.e. in a given email, word document, text file, etc). Or do you have to poke around or just "know". Thanks again!
  • Comment on Re^2: Mail::Sender character set/encoding

Replies are listed 'Best First'.
Re^3: Mail::Sender character set/encoding
by almut (Canon) on Mar 12, 2009 at 20:36 UTC
    Is there a universal way of determining a character set being used

    You can try Encode::Guess, but generally it's better to "just know", as there's no easy failsafe way to determine encodings. In other words, the module can tell apart UTF-8 from ISO-8859-1, but is having a hard time figuring out if something is ISO-8859-1 or ISO-8859-15... (typically, you'd also need to specify potential candidates as hints, e.g. via ->set_suspects())

      So, to "just know" is it a matter of being familiar with different character encoding sets and the context of the document in question (for example, a text file created with vi on my machine, an email from China, or a Word attachment from Zimbabwe)?

      I took at look at this page http://en.wikipedia.org/wiki/Character_encoding most of that is fairly familiar to me. It did add a little to the confusion that latin1 or us-ascii wasn't mentioned on that page (ok, I know Wiki is not the definitive source).

      I suppose like anything else, experience is valuable. So after this, I guess I know a little more.

        Ideally, there is meta info, either explicitly (as in Content-Type, etc.), or implicitly (such as where the data originated from, as you're saying).  For Unicode encodings, there's also the BOM.

        In case there isn't, we humans (as opposed to computers) typically excel at figuring out what encoding is being used (at least if we understand the language the text is in) — mainly because our mind has abilities and resources (like world knowledge) that AI is still struggling with...  As we can usually tell that only certain characters make sense in certain positions, we can check their byte values against what's documented, and arrive at a decision (or at least an educated guess) rather soon.

Re^3: Mail::Sender character set/encoding
by zwon (Abbot) on Mar 12, 2009 at 20:35 UTC

    There are some modules that can detect text encoding, but generally you should just know.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://750255]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (1)
As of 2024-04-25 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found