in reply to Fixing broken character encoding
It is possible if its been malformed once (single step), multiple iterations can be impossible.
IIRC I think http://validator.w3.org/ can help ( Bundle::W3C::Validator )
As can these
HTML::Encoding - Determine the encoding of HTML/XML/XHTML documents
Encode::Detective - detect a data encoding
Encoding::FixLatin - takes mixed encoding input and produces UTF-8 output
Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the correct one
But you ought to post some minimal html
I figure it ought to be as simple as parsing the file, decoding the entities, treating the string as octets and deciding what charset it is
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Fixing broken character encoding
by Anonymous Monk on Jul 26, 2012 at 04:04 UTC | |
by pfaut (Priest) on Jul 26, 2012 at 10:25 UTC |
In Section
Seekers of Perl Wisdom