Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: JSON::XS Cyrillic unicode not saving properly

by choroba (Cardinal)
on Mar 20, 2021 at 23:46 UTC ( [id://11130020]=note: print w/replies, xml ) Need Help??


in reply to JSON::XS Cyrillic unicode not saving properly

You don't seem to use pg_enable_utf8.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: JSON::XS Cyrillic unicode not saving properly
by cormanaz (Deacon) on Mar 21, 2021 at 15:00 UTC
    I added $dbh->{pg_enable_utf8} = 1; and am getting the same result. Is that the correct way to do it?
      If you're using a recent version of DBD::Pg, you shouldn't need to set the parameter at all. The only way how I was able to get the wrong string back was to set it to 0 but not set the binmode of STDOUT. Try experimenting with the following script:
      #!/usr/bin/perl
      use warnings;
      use strict;
      use utf8;
      use feature qw{ say };
      
      use DBI;
      use Encode;
      
      for my $utf8 (1, -1, 0) {
          my $string = $utf8 ? 'Кирилл цагаан толгой'
                             : encode('UTF-8', 'Кирилл цагаан толгой');
      
          my $db = 'DBI'->connect('dbi:Pg:dbname=postgres', "", "");
          say $db->{pg_enable_utf8} = $utf8;
      
          $db->do('CREATE TABLE IF NOT EXISTS cyr (t TEXT)');
          $db->do('DELETE FROM cyr');
      
          my $insert = $db->prepare('INSERT INTO cyr (t) VALUES (?)');
          $insert->execute($string);
      
          my $select = $db->prepare('SELECT t FROM cyr');
          $select->execute;
          binmode *STDOUT, $utf8 ? ':encoding(UTF-8)' : ':raw';
          while (my @row = $select->fetchrow_array) {
              say @row;
          }
      }
      

      The script needs to be saved as UTF-8.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Output is three lines of Cyrillic (I don't know how to get this website to accept Cyrillic chars). Same thing when I write to a file instead of STDOUT.

        I think the database transaction is not the problem. When I run Lingua::Identify on the items in the DB then compare the detected language to the item in the DB, it's getting the languages correct. It couldn't do that if the DB was returning non-Cyrillic characters, right?

        This suggests the problem is in the output. But I am setting binmode on the output file with the correct encoding, just like in your example. I also tried declaring utf-8 encoding in the open statement, but am still getting the ANSII ourput. This is perplexing.

        Aha! I have verified that it's not the db transaction by running the code in a visual debugger. After the query @items contains tweets in Cyrillic. I printed those out to a flat file, and it opens Cyrillic. The problem is in JSON::XS.

        The docs say you have to use the OO interface and enable utf8 encoding. I tried doing this by changing the print statement to print OUT JSON::XS->new->utf8->encode($sample); but that still produces a json file with ascii characters. The docs on the OO interface are a little confusing. Anyone know the right way to do this?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11130020]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-19 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found