http://qs321.pair.com?node_id=1233460

1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I've run into an issue with sending UTF-8 to Postgres, using DBIx::Class and DBD::Pg.

Using DBIC_TRACE=1:

INSERT INTO method_data ( method_id, params) VALUES ( ?, ? ) RETURNING method_data_id: '59807', '{"addressLine1":"堅尼地道105號","ms-correlationid":"D2559182-719E-11E9-9E71-271EACAC0E00","city":"灣仔","country":"HK","user":"ut-testadmin"}'
Using DBI_TRACE:
     -> bind_param for DBD::Pg::st (DBI::st=HASH(0x7f982408c7f0)~0x7f982408d2b8 1 59813 undef)
    <- bind_param= ( 1 ) 1 items at DBI.pm line 1891
    -> bind_param for DBD::Pg::st (DBI::st=HASH(0x7f982408c7f0)~0x7f982408d2b8 2 '{"addressLine1":"�.尼�.��..105�..","country":"HK","city":"�.��.","ms-correlationid":"019A09E2-71A3-11E9-971A-C122ACAC0E00","user":"ut-testadmin"}' undef)
Making the query shown by DBIC_TRACE in the CLI succeeds normally.

Is this a known problem? I couldn't find much on the interwebs about it. I don't see anything in the source for DBD::Pg concerning encoding: are the C libs doing something under the hood?

Thanks in advance.


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re: Postgres UTF-8 woes
by choroba (Cardinal) on May 08, 2019 at 18:57 UTC
    Do you send decoded unicode or encoded (e.g. UTF-8)? Have you set client or server encoding?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Hi choroba, thank you for replying.

      Client and server encoding are both set to UTF-8:

      Activity=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access +privileges -----------+---------+----------+-------------+-------------+--------- +------------ Activity | ******* | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | ******* | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | ******* | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/***** +** + | | | | | *******= +CTc/******* template1 | ******* | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/***** +** + | | | | | *******= +CTc/******* (6 rows) Activity=# show client_encoding; client_encoding ----------------- UTF8 (1 row)
      The data is passed decoded from UTF-8, which I believe is correct. DBIx seems to handle it as expected (given the encodings specified in the schema), but the SQL then produced by DBD::Pg/DBI is garbled, as shown. Passing data encoded to UTF-8 to DBIx results in even more garbled content in the DB.


      The way forward always starts with a minimal test.
        What version of DBD::Pg do you use? From the Changes it seems 3.3.0 is the minimum version with reasonable UTF-8 support, and probably even 3.6.0.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Postgres UTF-8 woes
by erix (Prior) on May 08, 2019 at 21:48 UTC

    I don't really see through your problem but have you tried various settings for pg_enable_utf8 ? (There's some info in perldoc DBD::Pg).

    In a quick test with your data and DBI/DBD::Pg but without DBIx::Class, I needed $dbh->{pg_enable_utf8} = 0;

      Thanks erix, I'll try that. My understanding of pg_enable_utf8 was that it only affects data retrieved from the DB, and is not normally needed with Pg >3.0, but I did try setting it to 1 from the default -1. I'll see if 0 affects it.


      The way forward always starts with a minimal test.

        Thank you very much indeed for the tip Master erix. Your help solved my problem.


        The way forward always starts with a minimal test.