Postgres UTF-8 woes

1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I've run into an issue with sending UTF-8 to Postgres, using DBIx::Class and DBD::Pg.

Using DBIC_TRACE=1:

INSERT INTO method_data ( method_id, params) VALUES ( ?, ? ) RETURNING method_data_id: '59807', '{"addressLine1":"堅尼地道105號","ms-correlationid":"D2559182-719E-11E9-9E71-271EACAC0E00","city":"灣仔","country":"HK","user":"ut-testadmin"}'

Using DBI_TRACE:

     -> bind_param for DBD::Pg::st (DBI::st=HASH(0x7f982408c7f0)~0x7f982408d2b8 1 59813 undef)
    <- bind_param= ( 1 ) 1 items at DBI.pm line 1891
    -> bind_param for DBD::Pg::st (DBI::st=HASH(0x7f982408c7f0)~0x7f982408d2b8 2 '{"addressLine1":"�.尼�.��..105�..","country":"HK","city":"�.��.","ms-correlationid":"019A09E2-71A3-11E9-971A-C122ACAC0E00","user":"ut-testadmin"}' undef)

Making the query shown by DBIC_TRACE in the CLI succeeds normally.

Is this a known problem? I couldn't find much on the interwebs about it. I don't see anything in the source for DBD::Pg concerning encoding: are the C libs doing something under the hood?

Thanks in advance.

The way forward always starts with a minimal test.

Comment on Postgres UTF-8 woes Select or Download Code

Replies are listed 'Best First'.
Re: Postgres UTF-8 woes by choroba (Cardinal) on May 08, 2019 at 18:57 UTC
Do you send decoded unicode or encoded (e.g. UTF-8)? Have you set client or server encoding? `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^2: Postgres UTF-8 woes by 1nickt (Canon) on May 08, 2019 at 21:13 UTC
Hi choroba, thank you for replying. Client and server encoding are both set to UTF-8: Activity=# \l List of databases Name \| Owner \| Encoding \| Collate \| Ctype \| Access +privileges -----------+---------+----------+-------------+-------------+--------- +------------ Activity \| ***** \| UTF8 \| en_US.UTF-8 \| en_US.UTF-8 \| postgres \| *** \| UTF8 \| en_US.UTF-8 \| en_US.UTF-8 \| template0 \| *** \| UTF8 \| en_US.UTF-8 \| en_US.UTF-8 \| =c/* + + \| \| \| \| \| *****= +CTc/*** template1 \| *** \| UTF8 \| en_US.UTF-8 \| en_US.UTF-8 \| =c/* + + \| \| \| \| \| *****= +CTc/***** (6 rows) Activity=# show client_encoding; client_encoding ----------------- UTF8 (1 row) [download] The data is passed decoded from UTF-8, which I believe is correct. DBIx seems to handle it as expected (given the encodings specified in the schema), but the SQL then produced by DBD::Pg/DBI is garbled, as shown. Passing data encoded to UTF-8 to DBIx results in even more garbled content in the DB. The way forward always starts with a minimal test.	[reply] [d/l]
Re^3: Postgres UTF-8 woes by choroba (Cardinal) on May 09, 2019 at 08:04 UTC
What version of DBD::Pg do you use? From the Changes it seems 3.3.0 is the minimum version with reasonable UTF-8 support, and probably even 3.6.0. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^4: Postgres UTF-8 woes by 1nickt (Canon) on May 09, 2019 at 10:40 UTC
Re: Postgres UTF-8 woes by erix (Prior) on May 08, 2019 at 21:48 UTC
I don't really see through your problem but have you tried various settings for pg_enable_utf8 ? (There's some info in `perldoc DBD::Pg`). In a quick test with your data and DBI/DBD::Pg but without DBIx::Class, I needed `$dbh->{pg_enable_utf8} = 0;`	[reply] [d/l] [select]
Re^2: Postgres UTF-8 woes by 1nickt (Canon) on May 08, 2019 at 22:56 UTC
Thanks erix, I'll try that. My understanding of `pg_enable_utf8` was that it only affects data retrieved from the DB, and is not normally needed with Pg >3.0, but I did try setting it to 1 from the default -1. I'll see if 0 affects it. The way forward always starts with a minimal test.	[reply] [d/l]
Re^3: Postgres UTF-8 woes by 1nickt (Canon) on May 09, 2019 at 11:40 UTC
Thank you very much indeed for the tip Master erix. Your help solved my problem. The way forward always starts with a minimal test.	[reply]

Back to Seekers of Perl Wisdom