Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Malformed UTF-8 character

by ikegami (Patriarch)
on Nov 30, 2022 at 14:08 UTC ( [id://11148459]=note: print w/replies, xml ) Need Help??


in reply to Malformed UTF-8 character

That indicates a scalar which become corrupted when Perl or XS code improperly decoded a string.

For example, use utf8; doesn't validate if the source code is actually valid UTF-8, and produces corrupt scalars if it's not.

$ not_utf8="$( printf "\x96" )" $ perl -e"use utf8; q{$not_utf8}" Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) at -e line 1. Malformed UTF-8 character (fatal) at -e line 1.

(Fortunately, use utf8; catches the problem and bails.)

Are you using use utf8; with a source file that isn't encoded using UTF-8?

The likely culprit is a U+2013 EN DASH ("–") encoded using cp1252.


Using the :utf8 encoding layer can also produce corrupt scalars.

$ printf "\x96" | perl -nle' use open ":std", ":utf8"; printf "%vX\n", $_; ' Malformed UTF-8 character: \x96 (unexpected continuation byte 0x96, wi +th no preceding start byte) in printf at -e line 1, <> line 1. 0

That's why :encoding(UTF-8) should be used instead.

Replies are listed 'Best First'.
Re^2: Malformed UTF-8 character
by BillKSmith (Monsignor) on Dec 03, 2022 at 13:52 UTC
    I wish that I had recognized that your "likely suspect" was the key to the whole mystery.
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148459]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2024-04-16 21:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found