Ea has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing an XML document that has an acute accent acting as a right quote. It's char(180) (aka U+00B4) and the document encoding is UTF-8. When I run XML::Parser over it (or even the xml_pp tool), I get a "not well-formed (invalid token)" error.

I've naively tried adding use utf8; to the script, but I still get the error. I believe I could just tr/// that bad boy into something less problematic, but I was wondering if there was a lazier way, like a setting in XML::Parser that I can add to the handlers?

For the curious, I'm getting my output from LaTeXML, a set perl tools for converting LaTeX to XML. There might be some scope to process the output before I parse the XML, but I suspect that it'll look a.


perl -e 'print qq(Just another Perl Hacker\n)' # where's the irony switch?