Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I have an xml file encoded in ISO-8859-1. I parse it with XML::DOM::Parser. Then I extract some of the text and Attribute Values using methods of XML::DOM::Node. The problem is that the strings I get with these methods are encoded in UTF-8 and all special characters (vowels with accents, german umlauts,...) are garbled. Is there any way to set the encoding of the strings you get from the Node class?
Re: XML encoding ISO-8859-1
by mirod (Canon) on Aug 23, 2001 at 13:39 UTC
|
There is no way to set the encoding for the strings, you will have to convert them.
Search for Unicode or character conversion or encoding and you will find tons of nodes describing how to go from UTF-8 to ISO-8859-1 (use Text::Iconv).
| [reply] |
Re: XML encoding ISO-8859-1
by stefan k (Curate) on Aug 23, 2001 at 13:09 UTC
|
| [reply] |
|
You are right in that all of the file is encoded in ISO-8859-1 and I account for this by setting the ProtocolEncoding of the Parser. The problem is, that the strings I get *back* from the Parser are NOT in the same encoding, but in UTF-8 (it says so somewhere in the documentation).
Changing the encoding of the input file would be an option, but how do I do this? Just putting 'encoding="UTF-8"' into the XML declaration does not work, because then I get malformedness errors (because the input file is not in UTF-8).
Robert
| [reply] |
|