http://qs321.pair.com?node_id=11105384


in reply to Is there some universal Unicode+UTF8 switch?

Try utf8::all. It's not universal, because it handles only the core functionality, not libraries. Your use case can be much simplified, though. I strongly suspect you have too much code. Consider: This means the following DWYW:
use LWP::UserAgent qw(); use JSON::MaybeXS qw(decode_json); my $ua = LWP::UserAgent->new; my $res = $ua->get('https://ru.wikipedia.org/w/api.php?action=query&fo +rmat=json&formatversion=2&list=allusers&auactiveusers&aufrom=%D0%91') +; die $res->status_line unless $res->is_success; my $json_OCTETS = $res->content; my $all_users_CHARACTERS = decode_json $json_OCTETS; my $continue_aufrom_CHARACTERS = $all_users_CHARACTERS->{continue}{auf +rom};
Your CGI script's templating system should take care to produce UTF-8 encoded octets. If you don't have one, then either one of is appropriate. The first variant is more robust.

Replies are listed 'Best First'.
Re^2: Is there some universal Unicode+UTF8 switch?
by VK (Novice) on Sep 01, 2019 at 19:39 UTC

    use utf8::all; sounds the most promising, thank you and I'll try it. The reason I didn't use it yet is that the main doc https://perldoc.perl.org/utf8.html doesn't have a single mention of this option - so either you know about utf8::all in advance, or you are out of luck.

    The JSON function shortcut decode_json has UTF8 decoding hardcoded to "on". To make it "off" and to avoid double encoding I had to use the full call like JSON->new->utf8(0)->decode($response->content) If utf8::all solves this problem as well, then I can use the function shortcut. I will check everything later today.

    (Update) Noop, I rechecked - only the current long code reliably working for non-ASCII. For the sample URL above I do my $response = LWP call and then

    1. my $data1 = JSON->new->utf8(0)->decode($response->content);
    2. my $data2 = decode_json($response->content);
    3. my $data3 = $response->decoded_content;
    and then my $test = $data1->{query}->{allusers}[0]->{name};

    1) is always working for my needs. 2) is woking if called in some obvious scalar context. One tries slice referenced array or anything complex - it falls to the "Perl branded jam" with Ñ Ð and the like. 3) is stably DOA (dead on arrival) so the same 2) but right away.

    So utf8::all should be written and extended to some utf8::all_throughout "Written" means to the reliability and stability level to be included in prominent Perl distributions. Until then the answer to my initial question seems negative.

      slice referenced array or anything complex - it falls to the "Perl branded jam"
      I'm sceptical about that claim. Show your code.
Re^2: Is there some universal Unicode+UTF8 switch?
by Anonymous Monk on Sep 02, 2019 at 11:29 UTC
    Just stylin:
    STDOUT->binmode(':encoding(UTF-8)'); STDOUT->print($continue_aufrom_CHARACTERS);