http://qs321.pair.com?node_id=11101520


in reply to Re^2: Special character not being captured
in thread Special character not being captured

decode expects the input to be in UTF-8, but you supplied the byte \xC3. It doesn't represent a UTF-8 sequence, so it's decoded to the Replacement Character \xFFFD.

You need

Encode::encode("UTF-8", substr(Encode::decode("UTF-8", "\xC3\x86"),0,1 +))
to get UTF-8 Æ back.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^4: Special character not being captured
by vr (Curate) on Jun 18, 2019 at 13:09 UTC

    right, UTF-8 input wasn't decoded, string of octets passed to first_alpha, 1st octet i.e. \xC3 was returned and should have become hash key, but somehow it is Replacement Character. But then, dump as shown (hash of arrays) can not be alpha_hash output, so it's not really SSCE, can only guess.

      It might be the terminal who changes the character into the replacement one:
      $ perl -wE '%h = ("\xc3" => 1); say $_, ord for keys %h'
      �195
      
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]