Ea has asked for the wisdom of the Perl Monks concerning the following question:
I'm parsing an XML document that has an acute accent acting as a right quote. It's char(180) (aka U+00B4) and the document encoding is UTF-8. When I run XML::Parser over it (or even the xml_pp tool), I get a "not well-formed (invalid token)" error.
I've naively tried adding use utf8; to the script, but I still get the error. I believe I could just tr/// that bad boy into something less problematic, but I was wondering if there was a lazier way, like a setting in XML::Parser that I can add to the handlers?
For the curious, I'm getting my output from LaTeXML, a set perl tools for converting LaTeX to XML. There might be some scope to process the output before I parse the XML, but I suspect that it'll look a.
thanks,
perl -e 'print qq(Just another Perl Hacker\n)' # where's the irony switch?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: XML invalid token
by Anonymous Monk on Nov 15, 2011 at 16:10 UTC | |
by Ea (Chaplain) on Nov 15, 2011 at 16:19 UTC | |
Re: XML invalid token
by Sinistral (Monsignor) on Nov 15, 2011 at 15:28 UTC |
Back to
Seekers of Perl Wisdom