To get an answer to that, you'd have to tell us what is at line 2, column 25, byte 68 of your input. (And likely some of the surrounding text as well.)
Sorry, that's what I get for trying to reply at that hour. I have replicated the problem now by downloading your source and running recode utf8..latin1 on it.
The problem is with the XML parser before it ever actually reaches Twig -- the twig encoding filters are to convert parsed information from one encoding to another, but they can't actually affect the parsing itself. The issue is that your xml declaration has told the parser to expect one encoding, but it has received another. In other words, if you had:
__DATA__
<?xml version = '1.0' encoding = 'iso-8859-1'?>
<Text>5CH (the BACKSLASH ý\ý in ISO-IR 6) shall</Text>
... and you were absolutely certain that your source file had latin-1 encoding, you wouldn't have to mess with input filters at all. This would be sufficient to deal with it:
my $twig = XML::Twig->new();
$twig->parse( $xml );
If you later recoded that file to utf-8 (via something like recode latin1..utf8 filename), you might have problems with the charset again, though odds are that it would actually parse and give you garbage. THEN you might need to play with an input filter, not to get the parsing working, but to convert the garbage you got out of it to what you wanted.
|