|P is for Practical|
What encoding am I (probably) using?by tphyahoo (Vicar)
|on May 13, 2005 at 12:25 UTC||Need Help??|
tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:
O wise monks,
Let's say I want to process some text whose encoding is uncertain, except that it is probably text, and probably in a Western (1 byte character) language. I want to do some text processing on it such as, extract all words from it. Before doing anything, I want to use
to put everything into iso-8859-1 in (probable) good form.
Is there anything I can use that will give me the "probable encoding" for a file / string / whatever?
I was led in this direction by the venerable Thundergnat's answer to my
where he suggested I run Encode::from_to($latinresult, 'cp437', 'iso-8859-1'); before matching the output of a system call on my german WinXP box. But how did he know to use 'cp437'?
UPDATE: Thanks monks, Encode::Guess looks good. I'm going to go try it out.