I can't see from the docs how to get encoded data back
You might not be able to. The whole point of the parser is to extract the information represented by the XML document, no matter how it's encoded using XML.
You shouldn't have to care whether "Ú" is stored as Ú, bytes C3 9A (in an XML document that uses UTF-8), or byte DA (in an XML document that uses cp1252). Nor should you want to know.
| [reply] [d/l] [select] |
my ($container) = $dom->findnodes('/container');
my $n2 = $container->appendChild('XML::LibXML::Element'->new('node2'))
+;
$n2->appendText("\N{LATIN CAPITAL LETTER U WITH ACUTE}");
binmode *STDOUT, ':encoding(UTF-8)';
print $dom;
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
I can't see from the docs how to get encoded data back
I didn't find a way either, but probably this is intentional because you shouldn't. It is bad practice. As soon as Perl has parsed your document into a tree, it is entitled to forget in whatever encoding it was delivered.
If you want encoded data back, then you get to chose the encoding, and encode by yourself.
I also think that lots of Perl module documentation should be revisited with regard to the ominous "UTF-8 flag". The parenthesis "(UTF-8 encoded with UTF8 flag on)" is at least misleading and should best be eradicated: the relevant thing is "character string", as opposed to "binary" string ("bytes" and "encoded" strings are binary for that purpose). For the user of any module it isn't relevant in which encoding Perl stores character strings internally.
| [reply] |
Thanks. If the intention of the module is to decode data, I can react accordingly. I would also agree that the documentation could be improved, but I do have a tendency to rant about documentation and don't think that it would help solve my problem.
Regards,
John Davies
| [reply] |