Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Update: Perlmonks seems to have trouble rendering some of this. The question is also at stackoverflow.

I'm getting some corrupted JSON and I've reduced it down to this test case.

use utf8; use 5.18.0; use Test::More; use Test::utf8; use JSON::XS; BEGIN { # damn it my $builder = Test::Builder->new; foreach (qw/output failure_output todo_output/) { binmode $builder->$_, ':encoding(UTF-8)'; } } foreach my $string ( 'Deliver «French Bread»', '日本&# +22269;' ) { my $hashref = { value => $string }; is_sane_utf8 $string, "String: $string"; my $json = encode_json($hashref); is_sane_utf8 $json, "JSON: $json"; say STDERR $json; } diag ord('»'); done_testing;

And this is the output:

utf8.t .. ok 1 - String: Deliver «French Bread» not ok 2 - JSON: {"value":"Deliver «French Bread»"} # Failed test 'JSON: {"value":"Deliver «French Bread»"}' # at utf8.t line 17. # Found dodgy chars "<c2><ab>" at char 18 # String not flagged as utf8...was it meant to be? # Probably originally a LEFT-POINTING DOUBLE ANGLE QUOTATION MARK +char - codepoint 171 (dec), ab (hex) {"value":"Deliver «French Bread»"} ok 3 - String: &#26085;&#26412;&#22269; ok 4 - JSON: {"value":"æ&#151;¥æ&#156;¬å&#155;½"} 1..4 {"value":"&#26085;&#26412;&#22269;"} # 187

So the string containing guillemets («») is valid UTF-8, but the resulting JSON is not. What am I missing? The `utf8` pragma is correctly marking my source. Further, that trailing 187 is from the diag. That's less than 255, so it almost looks like a variant of the old Unicode bug in Perl. (And the test output still looks like crap. Never could quite get that right with Test::Builder).

Switching to `JSON::PP` produces the same output.

Further testing reveals the failure for all characters in range 127 to 255.

This is Perl 5.18.1 running on OS X Yosemite.


In reply to JSON::XS (and JSON::PP) appear to generate invalid UTF-8 for character in range 127 to 255 by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-03-28 19:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found