New Section Suggestion: Tip of the Day

XML uses Unicode in UTF-8 format to define all of its character data.

-- "Data Munging with Perl" by Dave Cross

<joke flavor=sarcastic type=ribbing>I bet you didn't know that. I bet 
+you are a happier, wiser person now that you do. </joke>
[download]

It would be nice if I could post small things that I didn't know before that will probably help others too.

I dont think this is quite a meditation but it is certainly useful.

Comment on New Section Suggestion: Tip of the Day Download Code

Replies are listed 'Best First'.
Re: New Section Suggestion: Tip of the Day by mirod (Canon) on Aug 21, 2001 at 02:00 UTC
Well, actually this is not quite the whole story (sorry davorg and princepawn ;--). XML does _not_ specify the encoding of the characters in a document, it strongly encourages the use of UTF-8 or UTF-16 (which are 2 ways of encoding Unicode characters), in fact XML parsers are only required to recognized those 2 encodings, if the encoding is _not_ UTF-8 or UTF-16 the the XML declaration must specify the encoding of the document, which hopefully the parser will understand, XML::Parser only understands UTF-8, UTF-16 and ISO-8859-1 (latin-1, the encoding commonly used in Western Europe), US-ASCII (non accented ASCII characters, all characters (but not control characters) under 127 is a subset of UTF-8. Which means that if you only have to deal with US/English XML data you don't have to bother about it (for now), XML::Encodings adds support to a whole lot of common encodings (I think the only one really missing is one of the chinese encodings), XML::Parser converts all characters to UTF-8 before passing them to the calling application, the cleanest way to go back from UTF-8 to whatever encoding your system likes is to use the Text::Iconv module, provided your system has the `iconv` library installed, a dirty (but sometimes useful) hack is to use the `original_string` method to get the... original string (pre-UTF-8 conversion), but then you will have to parse start and end tags to extract tag names and attributes, if you are converting your XML to HTML you might also want to have a look at HTML::Entities. One last info: UTF-8 support is now pretty good in Perl but you will have to wait for 5.8 to get UTF-8 hash keys (important for attribute names) and full regexp support.	[reply]
Re: New Section Suggestion: Tip of the Day by LD2 (Curate) on Aug 21, 2001 at 02:08 UTC
It's not a bad idea, but it has been briefly brought up before here and here. This may be messy, but I sort of liked the idea of just creating a node under mediations of Perl Tips or whatnot. That way, this doesn't create any new work for vroom - it's simple and easy. If a new section is created down the line - that node can be dissected and I'm sure the tips can be added then.	[reply]
Re: New Section Suggestion: Tip of the Day by Beatnik (Parson) on Aug 21, 2001 at 01:40 UTC
perl.org has their daily-tips mailing list (which seems to be not so daily). Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply]
Re: New Section Suggestion: Tip of the Day by little (Curate) on Aug 21, 2001 at 11:40 UTC
Hey, just kiddin: Reading the replies I suggest you better call it prejudice of the day though.grin Have a nice day All decision is left to your taste	[reply]


P is for Practical
	PerlMonks