in reply to Re^2: Lost in encodings
in thread Lost in encodings
I can only speculate as long as you don't show us a Dump of $str.
> My Terminal (iTerm2) is UTF-8. The OS is MacOS.
I think if the encoding of the output channel is byte oriented, this would explain your false negative results.
IOW your decoding is right but the test is wrong.
update
Yep, tested on my Ubuntu VM, with utf8 console
DB<2> use Devel::Peek DB<4> use Encode qw(decode encode) DB<10> $str="kü" DB<11> Dump $str SV = PV(0x28054a0) at 0x280a370 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2838a60 "k\303\274"\0 # <-- 303 274 is octa +l for UTF-8 encoding of "ü" * CUR = 3 LEN = 16 DB<12> p $str kü DB<13> p $dec = decode("utf8",$str,Encode::FB_WARN) k DB<14> Dump $dec SV = PVMG(0x28e41f0) at 0x2a19368 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x28fe5a0 "k\303\274"\0 [UTF8 "k\x{fc}"] # <-- correct UTF8 U+ +00FC is codepoint for "ü" CUR = 3 LEN = 16 DB<15> binmode DB::OUT,':utf8'; # <-- fix encoding la +yer DB<16> p $dec kü DB<17>
Unicode SYMBOL UTF-8 UTF-8 NAME Codepoint hex oct U+00FC ü c3 bc 303 274 LATIN SMALL LETTER U WITH DIAERE +SIS
*)
DB<19> printf "%X ", $_ for 0303, 0274 C3 BC DB<20> printf "%X ", oct($_) for qw/303 274/ C3 BC
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice
|
---|
In Section
Seekers of Perl Wisdom