in reply to Re: Lost in encodings
in thread Lost in encodings

Thanks a lot haj. It sounds all good (and complicated). Will need some time to work it out.

BTW: You're right. My Terminal (iTerm2) is UTF-8. The OS is MacOS.


Replies are listed 'Best First'.
Re^3: Lost in encodings (updated: POC)
by LanX (Cardinal) on Feb 08, 2020 at 12:14 UTC
    I can only speculate as long as you don't show us a Dump of $str.

    > My Terminal (iTerm2) is UTF-8. The OS is MacOS.

    I think if the encoding of the output channel is byte oriented, this would explain your false negative results.

    IOW your decoding is right but the test is wrong.


    Yep, tested on my Ubuntu VM, with utf8 console

    DB<2> use Devel::Peek DB<4> use Encode qw(decode encode) DB<10> $str="k" DB<11> Dump $str SV = PV(0x28054a0) at 0x280a370 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2838a60 "k\303\274"\0 # <-- 303 274 is octa +l for UTF-8 encoding of "" * CUR = 3 LEN = 16 DB<12> p $str k DB<13> p $dec = decode("utf8",$str,Encode::FB_WARN) k DB<14> Dump $dec SV = PVMG(0x28e41f0) at 0x2a19368 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x28fe5a0 "k\303\274"\0 [UTF8 "k\x{fc}"] # <-- correct UTF8 U+ +00FC is codepoint for "" CUR = 3 LEN = 16 DB<15> binmode DB::OUT,':utf8'; # <-- fix encoding la +yer DB<16> p $dec k DB<17>

    Unicode SYMBOL UTF-8 UTF-8 NAME Codepoint hex oct U+00FC c3 bc 303 274 LATIN SMALL LETTER U WITH DIAERE +SIS


    DB<19> printf "%X ", $_ for  0303, 0274                                                                               
      C3 BC                                                                                                                     
    DB<20> printf "%X ", oct($_) for  qw/303 274/                                                                         
      C3 BC                                                                                                                              

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice