Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Converting entities in JSON context

by choroba (Cardinal)
on May 19, 2022 at 12:48 UTC ( [id://11144002]=note: print w/replies, xml ) Need Help??


in reply to Converting entities in JSON context

You can fix the output by enabling the correct IO layer for the output:
binmode *STDOUT, ':encoding(UTF-8)';

Perl JSON modules keep the strings in parsed structures as characters, but when serializing to JSON strings, they use bytes and UTF-8 encoding.

Update:

According to the specification, JSON doesn't use entities, but it can use the \uXXXX notation, so instead of using HTML::Entities, you can try

sub convert { my ($s) = @_; $s =~ s/&#x([[:xdigit:]]{4});/\\u$1/gr } my $entities_json = '{"school":"Eötvös Loránd Uni +versity"}'; my $converted_json = convert($entities_json); print "Original JSON: [$entities_json]\n"; print "Converted JSON: [$converted_json]\n"; # [{"school":"E\u00F6tv\ +u00F6s Lor\u00E1nd University"}] my $decoded_json = decode_json($converted_json); binmode STDOUT, ':encoding(UTF-8)'; print "School: " . $decoded_json->{'school'} . "\n"; # Eötvös Loránd +University

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: Converting entities in JSON context
by Anonymous Monk on May 19, 2022 at 12:58 UTC

    I'm not sure I understand. It doesn't seem to be about the display only. If I write a test for this (which is where I originally discovered this situation), something like:

    is($decoded_JSON->{'school'}, "Eötvös Loránd University", "convert_ent +ities correctly converted HTML entities in a JSON context, and yielde +d good JSON at the end");

    , this test fails. What should I be expecting from the test, or how do I write a test to make sure that the JSON I'm sending is what I should be sending?

      Here, you're using non-ASCII characters in the source code (the second argument to is).

      To tell Perl how to interpret them, you need to

      use utf8;

      This makes Perl interpret the part of the source in the lexical scope of the pragma as UTF-8 encoded. ( Update: And you need to save the source as UTF-8, too, of course.)

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        D'oh! Yes, of course! Makes sense now. Thank you!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11144002]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-20 12:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found