Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Encoding horridness

by Corion (Patriarch)
on Jul 12, 2017 at 13:10 UTC ( [id://1194922]=note: print w/replies, xml ) Need Help??


in reply to Encoding horridness

You will also have to make sure that the data you are writing to the XML file has been read properly from your data source and has been properly decoded when reading it.

Ideally you use Encode and decode all data when you read it into your program and encode it when writing it to your output. You have already taken care of encodeing the output, but the input might not be valid UTF-8 or not be recognized by Perl as such.

Assuming that your input data is a file with bytes encoded in Latin-1, you could read/decode the data as

while( <$fh>) { my $payload = decode('Latin-1', $_); };

For database values, you have the additional fun of finding out as what kind of data/encoding your database actually stores the values.

Replies are listed 'Best First'.
Re^2: Encoding horridness
by Anonymous Monk on Jul 12, 2017 at 14:07 UTC
    Good advice to be sure. But since latin-1 is a subset of unicode, isn't decode('Latin-1', $_) pretty much a no-op?

      No, because high-bit characters/octets in Latin-1 encode differently as octets in UTF-8, and Perl doesn't know what to do with high-bit characters when writing them.

        What I'm wondering, though, is if there's ever a situation where
        encode('utf8', decode('Latin-1', $_))
        produces different output from
        encode('utf8', $_)

      The OP wants to move from Latin-1 to UTF-8. Latin-1 is not a subset of UTF-8.

        Yes, and encode('utf8', decode('Latin-1', $_)) isn't a no-op.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1194922]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-03-29 12:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found