|P is for Practical|
Re^2: Lost in encodingsby Skeeve (Parson)
|on Feb 10, 2020 at 07:48 UTC||Need Help??|
And thanks once more for your long and helpful reply. I tested now a bit more, adopting your tip to use decoded_content.
So when I look now what is read by LWP I really get the correct Umlaut which I also can see when I set binmode on the debugger's IO.
The problem lies in the output of MIME::Lite::TT::HTML it seems. Looking at the code, it seems one can provide input and output charset. When you don't, MIME::Lite::TT::HTML assumes you already provide the correct charset :( So what I would need to do is provide the Charset of the internal perl strings - which doesn't exist I assume. I think I'll have to patch MIME::Lite::TT::HTML…
As you wrote:
Now when you write the data, you need to encode it to UTF-8. I suppose (but didn't test right now) that MIME::Lite::TT::HTML does the right thing and encodes for you if you provide the Charset attribute on the constructor. =FC is QP-encoding for an ISO-8859-1 'ü' and indeed wrong here. So if you did provide Charset => 'utf8', then shout up, I'll write some tests.
So here is my shout out. ;)
I assume the relevant part which needs to be patched is this https://metacpan.org/release/MIME-Lite-TT-HTML/source/lib/MIME/Lite/TT/HTML.pm Line 115-117:
Here I would provide "something" for the internal perl encoding. Maybe '*internal*'?.
Starting line 156, the code looks dubious. "remove_utf8_flag" does not seem correct. after what I learned from you and others in my threads.
And then the
from_toencoding should be changed I guess to:
What do you think?
Update I've created a patch which allows one to tell MIME::Lite::TT::HTML that text provided ($charset_input) is internal perl representation. With this in place, my script works as expected.
Unfortunately it seems the module is abandoned as the issues opened for it are 12 years old :(