in reply to Re^2: How to Encode/Decode double encoded string.
in thread How to Encode/Decode double encoded string.
Thanks for the clarifications! It is relevant information that the stuff comes from a Postgres database. There's a lot of encoding done behind the scenes if a database is part of the game. Postgres has a configurable server encoding and a configurable client encoding, either one or both might have changed between the legacy and current application.
The string � is an UTF-8-encoded version of the "Unicode replacement character". You get this by software which tries to decode strings as UTF-8 which contain non-UTF-8 characters, and then encodes this result as UTF-8. I guess that the decoding step gets fed with plain ISO-latin àáâä.
There is a chance that the bogus decoding happens in Perl's Postgres database driver. You can check that by setting the DBH option pg_enable_utf8 to zero when connecting. Your application will then be able to examine the "raw" contents, and decode accordingly.
A convenient way to examine strings is printf with the "v" format specifier:
printf "%vx",$stringFrom there you can decide how to proceed. Probably you need to re-build the data with a consistent encoding.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: How to Encode/Decode double encoded string.
by Anonymous Monk on Sep 22, 2020 at 11:08 UTC | |
by haj (Vicar) on Sep 22, 2020 at 11:45 UTC |