Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: XML::Parser Encoding (UTF-8 -> ISO-8859-1)

by grantm (Parson)
on Sep 12, 2002 at 02:13 UTC ( [id://197133]=note: print w/replies, xml ) Need Help??


in reply to XML::Parser Encoding (UTF-8 -> ISO-8859-1)

The Perl-XML FAQ has a section on encodings.

When you parse a file, the resulting data in the Perl variables will be UTF8 encoded regardless of the source encoding. I'm not an expert on mySQL but I wouldn't have thought that the act of INSERTing into a table would result in characters being converted from UTF8 to ISO8859-1.

With Perl 5.6.0 and later, you can convert a UTF string to a latin-1 string with the somewhat cryptic:

use utf8; my $latin = pack("C*", unpack('U*', $utf));

As jkahn said, it's not possible to map all UTF characters to Latin-1. In particular, the 'smart quotes' characters from MS Office apps do not have Latin-1 equivalents. You could simply encode characters beyond 0x7f as numeric entities (if you're ultimately going to write them back out as XML or HTML) or you could replace troublesome characters with more generic equivalents. The FAQ has some code snippets for both options.

Replies are listed 'Best First'.
Re: Re: XML::Parser Encoding (UTF-8 -> ISO-8859-1)
by Emanuel (Pilgrim) on Sep 12, 2002 at 02:32 UTC
    thank you very very very much!

    this solved my headache, and now everything is inserted into the database correctly. My task for aftersleep is to dig through the FAQ at perl-xml, and learn more about everything.

    You can't imagine how happy I am right now :)

    About the additional characters that won't fit into Latin-1, there won't be any occurence of such characters. But still i'm going to read up on this, since it's possible that something like this might occur one day.

    Emanuel

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://197133]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-18 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found