Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Is there some universal Unicode+UTF8 switch?

by haj (Vicar)
on Sep 02, 2019 at 02:03 UTC ( [id://11105399]=note: print w/replies, xml ) Need Help??


in reply to Is there some universal Unicode+UTF8 switch?

So my question is: does modern Perl have some "global super-mode use utf8"? So it would be like use utf8 - once placed at the beginning. But the effect would be to kill in the program any memories of any encoding but Unicode. So literally would mean "in this program there is nothing but Unicode. You get Unicode, you output Unicode. There is nothing in this world but Unicode." Something like this.

Such a general switch is likely to break things. But first, let me clarify: Unicode is not utf8. Unicode is a concept that assigns a number ("code point") to every character considered worthy by the Unicode consortium. UTF-8 is a recipe to map strings built from those characters to sequences of octets - and interpret sequences of octets as a Unicode string.

This distinction is important because it makes clear that the concept of "Unicode", and of its encodings, only applies to text. Binary data in a Perl program are not Unicode strings, and binary data in files or read with LWP are no valid UTF-8 (most of the times).

Your code JSON->new->utf8(0)->decode($response->content) looks just wrong. You should investigate how the correct invocation somehow mangles things. Step by step:

  • $response->content from LWP provides octets. The web server must specify the content type and encoding, and this is the encoding which must be used to interpret these octets, regardless from any settings in the program.
  • The combination JSON->new->utf8(0)->decode(...) expects a Unicode string, and you're feeding bytes. So, these bytes are fed into your JSON structure.
  • This would explain why you need to run $unicode_literal = decode('utf-8', $data->{result}) on your JSON fields.
These are two errors cancelling each other. You need to get rid of both of them, and I'm sure the program gets easier to understand after that.
  • Comment on Re: Is there some universal Unicode+UTF8 switch?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11105399]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-19 05:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found