Well, actually this is not quite the whole story (sorry davorg and princepawn ;--).
- XML does _not_ specify the encoding of the characters in a document,
- it strongly encourages the use of UTF-8 or UTF-16 (which are 2 ways of encoding Unicode characters), in fact XML parsers are only required to recognized those 2 encodings,
- if the encoding is _not_ UTF-8 or UTF-16 the the XML declaration must specify the encoding of the document, which hopefully the parser will understand,
- XML::Parser only understands UTF-8, UTF-16 and ISO-8859-1 (latin-1, the encoding commonly used in Western Europe),
- US-ASCII (non accented ASCII characters, all characters (but not control characters) under 127 is a subset of UTF-8. Which means that if you only have to deal with US/English XML data you don't have to bother about it (for now),
- XML::Encodings adds support to a whole lot of common encodings (I think the only one really missing is one of the chinese encodings),
- XML::Parser converts all characters to UTF-8 before passing them to the calling application,
- the cleanest way to go back from UTF-8 to whatever encoding your system likes is to use the Text::Iconv module, provided your system has the iconv library installed,
- a dirty (but sometimes useful) hack is to use the original_string method to get the... original string (pre-UTF-8 conversion), but then you will have to parse start and end tags to extract tag names and attributes,
- if you are converting your XML to HTML you might also want to have a look at HTML::Entities.
One last info: UTF-8 support is now pretty good in Perl but you will have to wait for 5.8 to get UTF-8 hash keys (important for attribute names) and full regexp support.
| [reply] |
It's not a bad idea, but it has been briefly brought up before here and here. This may be messy, but I sort of liked the idea of just creating a node under mediations of Perl Tips or whatnot. That way, this doesn't create any new work for vroom - it's simple and easy. If a new section is created down the line - that node can be dissected and I'm sure the tips can be added then. | [reply] |
perl.org has their daily-tips mailing list (which seems to be not so daily).
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur. | [reply] |
Hey, just kiddin: Reading the replies I suggest you better call it prejudice of the day though.grin
Have a nice day
All decision is left to your taste | [reply] |