Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: XML invalid token

by Anonymous Monk
on Nov 15, 2011 at 16:10 UTC ( #938210=note: print w/replies, xml ) Need Help??


in reply to XML invalid token

It sounds like your XML has the ACUTE ACCENT character encoded in ISO-8859-1 as 0xB4, even though it should be 0xC2 0xB4 in UTF-8. This usually happens when people produce XML by concatenating strings instead of using a proper XML library that is aware of encoding.

Is my assumption correct? Then you have to fix the problem with preprocessing in order to make an standard compliant XML stream. Either replace the byte as above, or if it really affects all bytes in the range 0x80 to 0xFF, simply change the encoding declaration in the XML prolog, e.g.:

<?xml version="1.0" encoding="ISO-8859-1" ?>

Replies are listed 'Best First'.
Re^2: XML invalid token
by Ea (Chaplain) on Nov 15, 2011 at 16:19 UTC
    That's the answer I was looking for. I change the encoding from UTF-8 to ISO-8859-1 and the error disappears. This gives me something to ask the LaTexML folx as to why they're producing XML documents claiming to be utf8 when they aren't.

    Many thanks, oh Nameless One!

  • Update - I even found that latexml has a --inputencoding=iso-8859-1 option to do just that. Now to figure out how to automatically detect a LaTex file's encoding ...
  • Update the II - checking out Encode::Guess and Encode::Detect

    perl -e 'print qq(Just another Perl Hacker\n)' # where's the irony switch?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938210]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2020-08-14 20:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which rocket would you take to Mars?










    Results (76 votes). Check out past polls.

    Notices?