Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The Perl-XML FAQ has a section on encodings.

When you parse a file, the resulting data in the Perl variables will be UTF8 encoded regardless of the source encoding. I'm not an expert on mySQL but I wouldn't have thought that the act of INSERTing into a table would result in characters being converted from UTF8 to ISO8859-1.

With Perl 5.6.0 and later, you can convert a UTF string to a latin-1 string with the somewhat cryptic:

use utf8; my $latin = pack("C*", unpack('U*', $utf));

As jkahn said, it's not possible to map all UTF characters to Latin-1. In particular, the 'smart quotes' characters from MS Office apps do not have Latin-1 equivalents. You could simply encode characters beyond 0x7f as numeric entities (if you're ultimately going to write them back out as XML or HTML) or you could replace troublesome characters with more generic equivalents. The FAQ has some code snippets for both options.


In reply to Re: XML::Parser Encoding (UTF-8 -> ISO-8859-1) by grantm
in thread XML::Parser Encoding (UTF-8 -> ISO-8859-1) by Emanuel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2024-04-18 07:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found