Syntactic Confectionery Delight | |
PerlMonks |
Regex to encode entities in XMLby epoptai (Curate) |
on Jun 11, 2001 at 01:52 UTC ( [id://87362]=perlquestion: print w/replies, xml ) | Need Help?? |
epoptai has asked for the wisdom of the Perl Monks concerning the following question:
(Edited by epoptai on 10/7/03 in reply to 296401)
I've got a problem with the output of Perlmonks' chatterbox xml ticker. When a high-bit ascii character like 'á' is entered in CB the character is not encoded, it's transmitted with the XML stream in a way that causes XML::Simple to die (as expected when receiving bad xml). It would be best if 'legal' xml were generated by perlmonks, but that's not the case so it needs to be dealt with. I don't know much about this subject, and have been using the following code from jcwren to convert the problem characters into underscore: That's very effective, but leaves something to be desired: the character behind the underscore. Since these characters can be detected and underscored, surely they can be detected and encoded properly? I've made many horribly broken attempts to encode these chrs but my lack of knowledge in this area always gets the last laugh. Recently mirod posted Converting character encodings which includes a regex from XML::TiePYX that gets very close to doing the job, but it only encodes some of the characters, not all. It barfs on ¤ and probably others: I seek an extended version of the XML::TiePYX regex to find and encode the full range of high-bit chrs specified in the first solution. I'd rather not use another module (XML parser or otherwise) for this task. thanks for your time - epoptai
--
Back to
Seekers of Perl Wisdom
|
|