Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Encoding horridness

by Anonymous Monk
on Jul 12, 2017 at 14:07 UTC ( [id://1194932]=note: print w/replies, xml ) Need Help??


in reply to Re: Encoding horridness
in thread Encoding horridness

Good advice to be sure. But since latin-1 is a subset of unicode, isn't decode('Latin-1', $_) pretty much a no-op?

Replies are listed 'Best First'.
Re^3: Encoding horridness
by Corion (Patriarch) on Jul 12, 2017 at 14:20 UTC

    No, because high-bit characters/octets in Latin-1 encode differently as octets in UTF-8, and Perl doesn't know what to do with high-bit characters when writing them.

      What I'm wondering, though, is if there's ever a situation where
      encode('utf8', decode('Latin-1', $_))
      produces different output from
      encode('utf8', $_)
        Yes, for example:
        $_ = decode('utf-8', "\N{LATIN SMALL LETTER A WITH ACUTE}"); say encode('utf8', $_); # Replacement character EF +BFBD. say encode('utf8', decode('Latin-1', $_)); # Dies.
        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re^3: Encoding horridness
by hippo (Bishop) on Jul 12, 2017 at 14:16 UTC

    The OP wants to move from Latin-1 to UTF-8. Latin-1 is not a subset of UTF-8.

      Yes, and encode('utf8', decode('Latin-1', $_)) isn't a no-op.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1194932]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-24 14:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found