http://qs321.pair.com?node_id=11112614


in reply to Re^2: Lost in encodings
in thread Lost in encodings

I can only speculate as long as you don't show us a Dump of $str.

> My Terminal (iTerm2) is UTF-8. The OS is MacOS.

I think if the encoding of the output channel is byte oriented, this would explain your false negative results.

IOW your decoding is right but the test is wrong.

update

Yep, tested on my Ubuntu VM, with utf8 console

DB<2> use Devel::Peek DB<4> use Encode qw(decode encode) DB<10> $str="kü" DB<11> Dump $str SV = PV(0x28054a0) at 0x280a370 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2838a60 "k\303\274"\0 # <-- 303 274 is octa +l for UTF-8 encoding of "ü" * CUR = 3 LEN = 16 DB<12> p $str kü DB<13> p $dec = decode("utf8",$str,Encode::FB_WARN) k DB<14> Dump $dec SV = PVMG(0x28e41f0) at 0x2a19368 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x28fe5a0 "k\303\274"\0 [UTF8 "k\x{fc}"] # <-- correct UTF8 U+ +00FC is codepoint for "ü" CUR = 3 LEN = 16 DB<15> binmode DB::OUT,':utf8'; # <-- fix encoding la +yer DB<16> p $dec kü DB<17>

Unicode SYMBOL UTF-8 UTF-8 NAME Codepoint hex oct U+00FC ü c3 bc 303 274 LATIN SMALL LETTER U WITH DIAERE +SIS

*)

DB<19> printf "%X ", $_ for  0303, 0274                                                                               
  C3 BC                                                                                                                     
DB<20> printf "%X ", oct($_) for  qw/303 274/                                                                         
  C3 BC                                                                                                                              

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice