Re: How to Encode/Decode double encoded string.

To decode characters properly, you need to understand how they are encoded. If all you have are the plain strings, then some ... guesswork ... can't be avoided. Also note that copypasting encoded UTF-8-strings doesn't work well: The UTF-8-encoding of рстф contains bytes which are non-printable control characters. Here, on PerlMonks, these are converted to spaces, so I can't even work with your example string.

Your second test string does not look like doubly UTF-8 encoded: Try to find out how you created that string, and then we can work from that. If I doubly encode рстф, I get:

УТ УТЁУТЂУТЄ.

Nitpick: your question would be easier to read with a little formatting. You can edit your post to add HTML for lists, and code (or examples) are better wrapped between <code> and </code> tags.

Comment on Re: How to Encode/Decode double encoded string. Download Code

Replies are listed 'Best First'.
Re^2: How to Encode/Decode double encoded string. by Anonymous Monk on Sep 22, 2020 at 05:59 UTC
Hi Haj, Thank you for the reply. I have basically below two strings in the postgres database `select title from TABLE; title ------------------------------------------- this is a test international яПНяПНяПНяПН #This string got inserte +d in the database using perl version 5.24.1 this is a test international У УЁУЂУЄ #This string got inserte +d in the database using oldest version of version i don't know the ex +act version but it seems it this version doesn't handle utf8 by defau +lt, this was inserted using Legacy version of the application.` [download] How can we get around this issue, as i mentioned earlier first string get the correct result when used Encode module but second doesn't. Thank you.	[reply] [d/l]
Re^2: How to Encode/Decode double encoded string. by Anonymous Monk on Sep 22, 2020 at 06:06 UTC
Hi Haj, Please disregard my previous reply, as i messed up with that Thank you for the reply. I have basically below two strings in the postgres database `select title from TABLE; title ------------------------------------------- this is a test international яПНяПНяПНяПН #This string got inserte d in the database using oldest version of version i don't know the ex act version but it seems it this version doesn't handle utf8 by defau lt, this was inserted using Legacy version of the application. <br> this is a test international У УЁУЂУЄ #This string got inserte d in the database using perl version 5.24.1` [download] Thank you	[reply] [d/l]
Re^3: How to Encode/Decode double encoded string. by haj (Vicar) on Sep 22, 2020 at 08:59 UTC
Thanks for the clarifications! It is relevant information that the stuff comes from a Postgres database. There's a lot of encoding done behind the scenes if a database is part of the game. Postgres has a configurable server encoding and a configurable client encoding, either one or both might have changed between the legacy and current application. The string `яПН` is an UTF-8-encoded version of the "Unicode replacement character". You get this by software which tries to decode strings as UTF-8 which contain non-UTF-8 characters, and then encodes this result as UTF-8. I guess that the decoding step gets fed with plain ISO-latin `рстф`. There is a chance that the bogus decoding happens in Perl's Postgres database driver. You can check that by setting the DBH option `pg_enable_utf8` to zero when connecting. Your application will then be able to examine the "raw" contents, and decode accordingly. A convenient way to examine strings is `printf` with the "v" format specifier: `printf "%vx",$string` From there you can decide how to proceed. Probably you need to re-build the data with a consistent encoding.	[reply] [d/l]
Re^4: How to Encode/Decode double encoded string. by Anonymous Monk on Sep 22, 2020 at 11:08 UTC
Hi Haj, Thank you for the reply. Yes i did set pg_enable_utf8 = 0, after that only i can see the same above raw strings on the web applications. Please note whatever i see in the database, i see as it is in the application too. postgres server encoding and client encoding is 'UTF-8', i tried to change the client_encoding to SQL_ASCII but it didn't help. I am still not quite sure what exactly needs to be done in order to get around this issue. Thank you	[reply]
Re^5: How to Encode/Decode double encoded string. by haj (Vicar) on Sep 22, 2020 at 11:45 UTC


Come for the quick hacks, stay for the epiphanies.
	PerlMonks