tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:
Let's say I want to process some text whose encoding is uncertain, except that it is probably text, and probably in a Western (1 byte character) language. I want to do some text processing on it such as, extract all words from it. Before doing anything, I want to use
Encode::from_to($line,"$probable_encoding",''iso-8859-1'')
to put everything into iso-8859-1 in (probable) good form.
Is there anything I can use that will give me the "probable encoding" for a file / string / whatever?
I was led in this direction by the venerable Thundergnat's answer to my
matching german characters output from system call.
where he suggested I run Encode::from_to($latinresult, 'cp437', 'iso-8859-1'); before matching the output of a system call on my german WinXP box. But how did he know to use 'cp437'?
UPDATE: Thanks monks, Encode::Guess looks good. I'm going to go try it out.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: What encoding am I (probably) using?
by mlh2003 (Scribe) on May 13, 2005 at 13:00 UTC | |
Re: What encoding am I (probably) using?
by thundergnat (Deacon) on May 13, 2005 at 13:02 UTC | |
Re: What encoding am I (probably) using?
by ysth (Canon) on May 13, 2005 at 13:25 UTC | |
by tphyahoo (Vicar) on May 13, 2005 at 13:58 UTC | |
Re: What encoding am I (probably) using?
by graff (Chancellor) on Sep 21, 2005 at 15:15 UTC |