http://qs321.pair.com?node_id=11137957


in reply to Re^2: Can someone please write a *working* JSON module (Send money)
in thread Can someone please write a *working* JSON module

Since JSON is (supposed to be) UTF-8, you merely need to mark the resulting data as being UTF-8 decoded. You could even do it for all string data, assuming that all your input has been verified as UTF-8. See for example Re: Bypass utf-8 encoding/decoding?, the function/macro you want is newSVpvn_utf8.

Obviously, this implies that you're trusting your input data to actually be valid UTF-8...

  • Comment on Re^3: Can someone please write a *working* JSON module (Send money)
  • Download Code

Replies are listed 'Best First'.
Re^4: Can someone please write a *working* JSON module (Send money)
by cnd (Acolyte) on Oct 24, 2021 at 11:06 UTC
    newSVpvn_utf8 sounds awesome!. Is there some simple way to detect invalid UTF-8 ?

    I guess something, somewhere, knows this - since croak() is the bane of my existence right now: email subject lines which may or may not have been truncated somewhere are 100% guaranteed to spew invalid UTF-8 at *some* point.

    Is there some way perl can auto-magically handle UTF-16 as well? e.g. (from the RFC): "... UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E"." (those 4 bytes (and/or 12 characters) are also an example of why truncated text breaks everything I expect)

      See Encode::Unicode for the translations between the various Unicode encodings.

      I think converting from UTF-16 to UTF-8 is merely a mathematical transformation between two encoding styles of the same number, so you can easily model that. I'm not sure how easy it is to determine whether a backslash-escaped sequence is UTF-8 or UTF-16, but maybe if it's just two characters, it's UTF-8.

      A reply falls below the community's threshold of quality. You may see it by logging in.