Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: New Section Suggestion: Tip of the Day

by mirod (Canon)
on Aug 21, 2001 at 02:00 UTC ( [id://106405]=note: print w/replies, xml ) Need Help??


in reply to New Section Suggestion: Tip of the Day

Well, actually this is not quite the whole story (sorry davorg and princepawn ;--).

  • XML does _not_ specify the encoding of the characters in a document,
  • it strongly encourages the use of UTF-8 or UTF-16 (which are 2 ways of encoding Unicode characters), in fact XML parsers are only required to recognized those 2 encodings,
  • if the encoding is _not_ UTF-8 or UTF-16 the the XML declaration must specify the encoding of the document, which hopefully the parser will understand,
  • XML::Parser only understands UTF-8, UTF-16 and ISO-8859-1 (latin-1, the encoding commonly used in Western Europe),
  • US-ASCII (non accented ASCII characters, all characters (but not control characters) under 127 is a subset of UTF-8. Which means that if you only have to deal with US/English XML data you don't have to bother about it (for now),
  • XML::Encodings adds support to a whole lot of common encodings (I think the only one really missing is one of the chinese encodings),
  • XML::Parser converts all characters to UTF-8 before passing them to the calling application,
  • the cleanest way to go back from UTF-8 to whatever encoding your system likes is to use the Text::Iconv module, provided your system has the iconv library installed,
  • a dirty (but sometimes useful) hack is to use the original_string method to get the... original string (pre-UTF-8 conversion), but then you will have to parse start and end tags to extract tag names and attributes,
  • if you are converting your XML to HTML you might also want to have a look at HTML::Entities.

One last info: UTF-8 support is now pretty good in Perl but you will have to wait for 5.8 to get UTF-8 hash keys (important for attribute names) and full regexp support.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://106405]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-03-29 07:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found