Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: error parsing utf8 chars using XML DOM parser

by Anonymous Monk
on Nov 01, 2011 at 15:10 UTC ( [id://935140]=note: print w/replies, xml ) Need Help??


in reply to error parsing utf8 chars using XML DOM parser

You need to make sure your output is in UTF8 too, otherwise you'll get Perl's internal codes.

If writing to a normal file, you can just specify utf8 encoding:

open(my $fh, '>:utf8', 'out.xml') || die "Failed to open file"; print $fh $str; close($fh) || die "Failed to close file";

For STDOUT, specify your default output files will be in utf8 and that this should apply to STD* handles too

use XML::DOM; use open OUT => ':utf8'; use open ':std'; # as before... print $str;

See "perldoc open" for some discussion.

Replies are listed 'Best First'.
Re^2: error parsing utf8 chars using XML DOM parser
by avih (Initiate) on Nov 01, 2011 at 17:31 UTC

    Thanks for the answers. Half way there. If I get the string my self with getData on DOM::Node, it looks great, but I still get Jibberish when printing the string XML::DOM produces.

    xml:

    <?xml version="1.0" encoding="utf-8"?> <Name>IssuéTést</Name>

    code:

    #!/usr/bin/perl -w use XML::DOM; use Encode; use open OUT => ":utf8"; use open ":std"; my $XmlParserObj = XML::DOM::Parser->new(); open(IN,"<:utf8","in.xml"); my @in = <IN>; my $inStr = join("",@in); #$inStr = encode("utf8",$inStr); # redundant if I use <:utf8 in open #$inStr = decode("utf8",$inStr); # make all tested strings get "?" ins +tead of latin chars my $doc = $XmlParserObj->parse($inStr); my $value = $doc->getElementsByTagName("Name")->item(0)->getChildNodes +()->item(0)->getData(); my $str = $doc->toString(); #binmode(STDOUT,":utf8"); # redundant print "is input utf8 ? ",Encode::is_utf8($inStr),"\n"; print "Input:\n".$inStr; print "is value utf8 ? ",Encode::is_utf8($value),"\n"; print "Value: ".$value."\n"; print "is output utf8 ? ",Encode::is_utf8($str),"\n"; print "Output:\n".$str; exit(0);

    output:

    is input utf8 ? 1 Input: <?xml version="1.0" encoding="utf-8"?> <Name>IssuéTést</Name> is value utf8 ? 1 Value: IssuéTést is output utf8 ? 1 Output: <?xml version="1.0" encoding="utf-8"?> <Name>Issu&#14932;st</Name>

    Thanks again

      Works for me. What version of XML::DOM do you have?

      Solved. Updating the modules and a little help from the Encode utilities solved it. Thanks guys.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://935140]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (2)
As of 2024-04-25 21:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found