Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

XML Parser encoding problem (was: XML::Parser)

by mahesh2812 (Initiate)
on Mar 14, 2002 at 06:30 UTC ( [id://151612]=perlquestion: print w/replies, xml ) Need Help??

mahesh2812 has asked for the wisdom of the Perl Monks concerning the following question:

I have a perl program which reads an xml string and parses it to retrieve the value.

I am using the foll. things:

--------------------------------
Perl Version: Active Perl 5.6.1
XML::DOM
XML::Parser

The XML has encoding type: ISO-8859-1

One of the node has a value "Humörsvängningar". When I read the node value through the Parser we get the string returned as "Humörsvängningar"

Is there a soln to get the right string with the encoding applied appropriately

Edited 2002-03-14 by mirod: changed title, added <p>tags

  • Comment on XML Parser encoding problem (was: XML::Parser)

Replies are listed 'Best First'.
Re: XML Parser
by mirod (Canon) on Mar 14, 2002 at 07:50 UTC

    As mentionned in the docs XML::Parser converts all the text it parses to UTF-8. You have several solutions to get the original encoding back: use the $p->original_string() method to get the text, or convert the UTF-8 back to ISO-8859-1 (BTW, are you sure you want to use this encoding? If you are dealing with scandinavian text you might need ISO-8859-4 to get all the characters you need). If your system has the iconv library (*nix or MS-Windows with cygwin) then you can use the Text::Iconv module.

    <plug type="shameless">You can also use XML::Twig in keep_encoding mode or with an input filter to get the dat in the required encoding:

    #!/bin/perl -w use strict; use XML::Twig; my $t = new XML::Twig(input_filter => 'latin1'); $t->parse( \*DATA); print "content: ", $t->root->text, "\n"; __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <doc>Humörsvängningar</doc>

    </plug>

    BTW, try not to use module names as the title of a node, as this might interfere with the review of the module, try "XML::Parser encoding problem" for example.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://151612]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-25 06:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found