Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

MIME::Lite::TT::Html - issues with umlauts

by Skeeve (Parson)
on Feb 04, 2020 at 09:03 UTC ( #11112352=perlquestion: print w/replies, xml ) Need Help??

Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow monks,

I have an issue with MIME::Lite::TT::Html in that it does not properly handle umlauts (UTF-8 in general).

This is my sample code:

#!/usr/bin/env perl use strict; use warnings; use MIME::Lite::TT::HTML; use Encode; my %params; $params{umlaut} = 'fbr'; Encode::_utf8_on($params{umlaut}); my %options= ( INCLUDE_PATH => '.', ENCODING => 'UTF-8', ); my $msg = MIME::Lite::TT::HTML->new( From => 'admin@example.com', To => 'frank@example.com', Subject => 'Your recent purchase', Template => { text => 'revsys.txt.tt', html => 'revsys.html.tt', }, TmplOptions => \%options, TmplParams => \%params, ); binmode STDOUT, ':utf8'; print "Should be: ", $params{umlaut},$/; print '-' x 79,$/; print $msg->as_string;

And these are the templates:

revsys.txt.tt: text: [% umlaut %] revsys.html.tt: <h1>[% umlaut %]</h1>

The output is:

Should be: fbr ---------------------------------------------------------------------- +--------- Content-Transfer-Encoding: binary Content-Type: multipart/alternative; boundary="_----------=_1580806618 +90" MIME-Version: 1.0 X-Mailer: MIME::Lite 3.031 (F2.85; T2.17; A2.21; B3.15; Q3.13) Subject: =?US-ASCII?B?WW91ciByZWNlbnQgcHVyY2hhc2U=?= Date: Tue, 4 Feb 2020 08:56:58 +0000 To: frank@example.com From: admin@example.com This is a multi-part message in MIME format. --_----------=_158080661890 Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii text: fbr --_----------=_158080661890 Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii <h1>fbr</h1> --_----------=_158080661890--

As you can see: The Template output is just "fbr" and should be "fbr".

Any advice? Did I do something wrong? Are there alternatives I should consider which can handle UTF-8?


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re: MIME::Lite::TT::Html - issues with umlauts
by choroba (Archbishop) on Feb 04, 2020 at 09:43 UTC
    The default encoding of the output is 7bit which clearly can't display characters outside the 7bit ASCII. Use the 8bit encoding instead:

    my $msg = MIME::Lite::TT::HTML->new( Encoding => '8bit', # ...

    Also, I don't see use utf8; anywhere in the source. If you're using UTF-8 in your source code, you should tell Perl about it.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: MIME::Lite::TT::Html - issues with umlauts
by haj (Deacon) on Feb 04, 2020 at 10:26 UTC

    The first thing which springs into my eye is the line: Encode::_utf8_on($params{umlaut}); It is almost always just plain wrong to use this function (and the documentation contains sufficient warnings). I guess that it is an attempt to fix your issue, but it is the wrong way. So I'll start by just deleting this line.

    The next thing you need to care about is the encoding in which your Perl source is stored. It contains literal u, and your source code editor will decide (or take hints from you) whether it stores it as UTF-8 or something like ISO-LATIN-1. If your editor saves as UTF-8, you need to inform the interpreter about this by including the following line in your source code: use utf8;

    For the moment I'll assume that your source file is encoded in ISO-LATIN-1, because there's more.

    The documentation of MIME::Lite::TT::HTML tells us that it has two options which are missing from your source code:

    • Encoding is the "transfer encoding", the default value is 7bit. Here you are: Umlauts are outside the 7-bit range. Probably the module should warn you that your content contains non-ASCII characters, but apparently it doesn't, it just deletes all offending characters (actually, this happens in MIME::Lite). Any of the other values will will do the trick: Today's mail programs usually support 8bit, and $msg->as_string; will be more readable for humans. If you want to play it safe, use quoted-printable.
    • Charset is the actual encoding of your text. You are using binmode STDOUT, ':utf8';, and you need to include that information in the mail so that the reader on the other end knows how to interpret the strings by setting Charset => 'utf8',.

    So this is my version of the source code, which I saved in ISO-LATIN-1-encoding:

    #!/usr/bin/env perl use strict; use warnings; use MIME::Lite::TT::HTML; my %params; $params{umlaut} = 'fbr'; #Encode::_utf8_on($params{umlaut}); my %options= ( INCLUDE_PATH => '.', ENCODING => 'UTF-8', ); my $msg = MIME::Lite::TT::HTML->new( From => 'admin@example.com', To => 'frank@example.com', Subject => 'Your recent purchase', Template => { text => 'revsys.txt.tt', html => 'revsys.html.tt', }, Charset => 'utf8', Encoding => '8bit', TmplOptions => \%options, TmplParams => \%params, ); binmode STDOUT, ':utf8'; print "Should be: ", $params{umlaut},$/; print '-' x 79,$/; print $msg->as_string;

    And on a contemporary Linux console (which means: it expects UTF-8 encoding), the output looks like this:

    Should be: fbr ---------------------------------------------------------------------- +--------- Content-Transfer-Encoding: binary Content-Type: multipart/alternative; boundary="_----------=_1580811788 +55330" MIME-Version: 1.0 X-Mailer: MIME::Lite 3.031 (F2.85; T2.17; A2.21; B3.15; Q3.13) Subject: =?UTF8?B?WW91ciByZWNlbnQgcHVyY2hhc2U=?= Date: Tue, 4 Feb 2020 10:23:08 +0000 From: admin@example.com To: frank@example.com This is a multi-part message in MIME format. --_----------=_158081178855330 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf8 text: fbr --_----------=_158081178855330 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/html; charset=utf8 <h1>fbr</h1> --_----------=_158081178855330--

    Note that both headers Content-Transfer-Encoding and Content-Type are now correct.

      Thanks a lot to both of you. It works.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11112352]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2020-10-01 06:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I dont succeed, I










    Results (176 votes). Check out past polls.

    Notices?