http://qs321.pair.com?node_id=11112588


in reply to Re: Lost in encodings
in thread Lost in encodings

> ... I recommend against Data::Peek. It will tell you only what you already know ("that's not right") but not give guidance how to fix.

It's true Devel::Peek gives no guidance why, but which non-AI command does? °

And how can you know his console is using utf-8? Could be Windows and CP850.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Update

°) Those commands will show me the hex codes in ASCII which is correctly displayed by every terminal (plus the monastery)

Replies are listed 'Best First'.
Re^3: Lost in encodings
by haj (Vicar) on Feb 07, 2020 at 21:08 UTC
    And how can you know is console is using utf-8? Could be Windows and CP850

    I would't claim I know. But since length 'Kü' is 3 but displays as 'Kü', I just guessed that a multibyte encoding is in place. CP850 is a 1-byte-encoding and should behave differently.

    As for Devel::Peek: Those commands will show me the hex codes in ASCII

    Devel::Peek will also issue several lines of data which are totally useless unless you're debugging XS code or Perl itself. A decent print unpack 'H*',$data does the same with less fuss.

      For completeness–

      perl -Mutf8 -CSD -E 'say length "Kü"' # 2
        Yes but only if it's a character-string, i.e. the utf8 flag is set.

        But the OP said the flag is not set.

        edit

        not sure what -CSD means.

        update

        got it perlrun

        The -C flag controls some of the Perl Unicode features.

        As of 5.8.1, the -C can be followed either by a number or a list of option letters. The letters, their numeric values, and effects are as follows; listing the letters is equal to summing the numbers.

        I 1 STDIN is assumed to be in UTF-8 O 2 STDOUT will be in UTF-8 E 4 STDERR will be in UTF-8 S 7 I + O + E i 8 UTF-8 is the default PerlIO layer for input streams o 16 UTF-8 is the default PerlIO layer for output streams D 24 i + o

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      It's verbose but will include the utf8 flag plus the dump showing the codepoints in hex. °

      Which is more helpful for us than the OP's copy and paste.

      > A decent print unpack 'H*',$data does the same with less fuss.

      True, but unpack tells me "why" it went wrong? ;)

      Update

      > but displays as 'Kü',

      Provided code areas in the monastery are encoded in utf8. I vividly remember problems here. *

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      *) the monastery is using windows-1252

      °)

      --- Testing: Täst:  "T\xE4st"
      SV = PVMG(0x29c3c98) at 0x29c1fa8
        REFCNT = 1
        FLAGS = (SMG,POK,pIOK,pNOK,pPOK,UTF8)
        IV = 0
        NV = 0
        PV = 0x24c9c68 "T\303\244st"\0 [UTF8 "T\x{e4}st"]
        CUR = 5
        LEN = 10
        MAGIC = 0x2ab1b38
          MG_VIRTUAL = &PL_vtbl_utf8
          MG_TYPE = PERL_MAGIC_utf8(w)
          MG_LEN = 4
      
      Hi again Harald

      > A decent print unpack 'H*',$data does the same with less fuss.

      Actually, why should I bother to spot the non-ASCII between all the hex-codes? °

      Please compare

      DB<50> $data = 'Künzler' DB<51> print unpack 'H*',$data 4b816e7a6c6572 # ORLY? DB<52> use Data::Dump qw/pp dd/ DB<53> dd $data "K\x81nzler" # <--- DB<54> use Devel::Peek DB<55> Dump $data SV = PVNV(0xd9adb8) at 0x351ac30 REFCNT = 1 FLAGS = (POK,IsCOW,pIOK,pNOK,pPOK) IV = 0 NV = 0 PV = 0x355bcd8 "K\201nzler"\0 # <--- CUR = 7 LEN = 10 COW_REFCNT = 2 DB<56>

      Hint: this time not UTF8, did you notice easily?

      Devel::Peek is core and shows multiple relevant infos in one command.

      It has some minor disadvantages, but if the OP had shown us the output we'd knew immediately that his code is correct, except the debugger settings.

      Telling people explicitly not to use it is pretty surprising ...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      °) yes I know that ASCII is below 0x80 and how to spot utf8 multi-bytes. But do others?

      And normally I use a water heater when I need tea and don't start to collect decent wood in the forest. ;-)