http://qs321.pair.com?node_id=11101519


in reply to Re: Special character not being captured
in thread Special character not being captured

I'm curious how the /x{FFFD} became hash key?

>perl -MEncode=decode -MData::Dump=dd -E "dd decode q(UTF-8), substr q +q(\xC3\x86),0,1" "\x{FFFD}"

but I don't see Lady_Aleena decoding anything.

Replies are listed 'Best First'.
Re^3: Special character not being captured
by choroba (Cardinal) on Jun 18, 2019 at 12:59 UTC
    decode expects the input to be in UTF-8, but you supplied the byte \xC3. It doesn't represent a UTF-8 sequence, so it's decoded to the Replacement Character \xFFFD.

    You need

    Encode::encode("UTF-8", substr(Encode::decode("UTF-8", "\xC3\x86"),0,1 +))
    to get UTF-8 Æ back.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      right, UTF-8 input wasn't decoded, string of octets passed to first_alpha, 1st octet i.e. \xC3 was returned and should have become hash key, but somehow it is Replacement Character. But then, dump as shown (hash of arrays) can not be alpha_hash output, so it's not really SSCE, can only guess.

        It might be the terminal who changes the character into the replacement one:
        $ perl -wE '%h = ("\xc3" => 1); say $_, ord for keys %h'
        �195
        
        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^3: Special character not being captured
by Lady_Aleena (Priest) on Jun 18, 2019 at 20:39 UTC

    Do you want to see step-by-step how I got to the point where the special character Æ got into first_alpha with all code along the way?

    No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
    Lady Aleena

      I suspect something happens after first_alpha, i.e. to its result -- \xC3 transformed to \xFFFD, but it was idle curiosity on my part, tangentially related to your problem (solved by choroba's advice), it's not worth your investigation, don't mind.