Re^3: Lost in encodings

in reply to Re^2: Lost in encodings
in thread Lost in encodings

That's a good job in tracking that down to the root cause!

When I wrote my previous response, I failed to check the version history of MIME::Lite::TT::HTML. Otherwise I would not made the assumption that the module does the right thing. It does not, as you found out. The current release is from 2007 (Perl 5.10-ish), so Unicode support was not only rather new and sometimes bumpy in Perl, but also module authors didn't have much experience with it, nor did all CPAN modules support it.

After having looked into the module's source code: The module works with all input in byte-encoded form. Today this is considered bad practice since it breaks a lot of Perl's string processing features, including those available from Template Toolkit. The module also assumes that the subject is encoded, in the same encoding as the template files, which is even more questionable. So yes, patching (or subclassing) the module's methods encode_subject and encode_body would be the way to go. Filing an issue for the module would also be fine, but according to the current list of open issues it doesn't look like the auther is still actively maintaining the module.

There is no keyword for Perl's internal encoding (because, by definition, these strings are decoded). So you could either invent one like *internal* or even us an undefined value as an indicator that your input should not be decoded. Your fix should do the trick if you want to go that path.

remove_utf8_flag is indeed scary and another example of an attempt to achieve cancellation of errors. I am pretty sure that TT processing could result in this flag being set, even if the TT results are pure ASCII. Instead of re-evaluating his assumptions, the author just killed the flag to make the string fit his expectations. With current Perl you wouldn't get rid of the flag like that, and Encode::decode will happily decode strings which already have the flag set.

Another alternative with more coding, but better alignment with current practice would be to get rid of $charset_input and expect that the subject and the template parameters are Perl strings. You'd still need TT's ENCODING config because UTF-8 text in files needs decoding, and $charset_output is also still required because MIME::Lite explicitly says that it expects encoded strings.

Comment on Re^3: Lost in encodings

In Section Seekers of Perl Wisdom