Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Transcoding MIME Strings

by PoorLuzer (Beadle)
on Oct 09, 2009 at 18:08 UTC ( [id://800354]=perlquestion: print w/replies, xml ) Need Help??

PoorLuzer has asked for the wisdom of the Perl Monks concerning the following question:

I have been handling some email related process recently, mainly with processing information from the raw mail headers.

What I want to understand are: 1. how to map strings like these:

=?iso-8859-1?Q?Communiqu=E9?= =?iso-8859-1?Q?Telef=F3nica?= =?ISO-8859-1?Q?Montre=E1l?= =?iso-8859-1?Q?Minist=E8re?=

to these:

Communique Telefonica Montreal Ministere

I get the same, unhelpful (does not help me get the desired mappings above), results regardless of whether I use:

use MIME::Words qw(:all); my $rawSub = $head->get( 'subject' ); my $mailSubject = decode_mimewords( $rawSub );
or I use:

use MIME::WordDecoder; my $wd = supported MIME::WordDecoder "US-ASCII"; ### Decode a MIME string (e.g., into Latin1) via the default decod +er: my $str = $wd->decode( $rawSub ); print $str, "\n";

For some reason that I am unable to understand, the following fails completely (the output shows me the unknown character designator '_' for any character!):

use MIME::WordDecoder; my $wd = new MIME::WordDecoder::US_ASCII; $wd->unknown( '_' ); # What to translate unknown characters to $wd->collapse( 1 ); # Collapse runs of unknown characters to a sin +gle unknown print $wd->decode( $rawSub ), "\n";

2. How do I make Date::Manip::UnixDate completely ignore the timezone part of the date string?

Here is some code:

use Date::Manip; Date_Init( "ConvTZ=IGNORE", "TZ=GMT" ); # We don't want time conve +rsions happening. GMailBackup ignores timezone conformance and keeps +timestamps as they were # If using Date::Transform, you can use '%g' that is short for %a, + %d %b %Y %H:%M:%S %z - Fri, 28 Apr 1995 17:23:15 EDT; For eg : Sat, +9 Feb 2008 17:04:08 -0330 $mailDate = UnixDate( $mailDate , '%Y_%m_%Q-%H%M%S' ); # 2008_02_2 +0080209-170408

For this example, consider the date string:

Sat, 9 Feb 2008 17:04:08 -0380

The timezone in the string is clearly bogus (minutes value > 60); yet this happens in "real life".

As evident from the code, I don't need the timezone information anyways. So how do I "lop it off" (is it recommended?) reliably?

I would rather have someone tell me a magic flag that I am unaware of that makes the UnixDate ignore the timezone as "chopping it off" promises to complicate life even more (time zones can be in different formats. Eg : EET, -330, +710, etc)

Replies are listed 'Best First'.
Re: Transcoding MIME Strings
by zwon (Abbot) on Oct 09, 2009 at 18:24 UTC

    The following code produces desired output:

    use strict; use warnings; use 5.010; use MIME::WordDecoder; use open ':encoding(utf8)'; use open ':std'; my $wd = supported MIME::WordDecoder "iso-8859-1"; for my $enc ( '=?iso-8859-1?Q?Communiqu=E9?=', '=?iso-8859-1?Q?Telef=F3nica?=', '=?ISO-8859-1?Q?Montre=E1l?=', '=?iso-8859-1?Q?Minist=E8re?=' ) { say $wd->decode($enc); } __END__ Communiqué Telefónica Montreál Ministère

    Update: and MIME::Words gives me the same output:

    use strict; use warnings; use 5.010; use open ':utf8'; use open ':std'; use MIME::Words qw(:all); my @encoded = ( '=?iso-8859-1?Q?Communiqu=E9?=', '=?iso-8859-1?Q?Telef=F3nica?=', '=?ISO-8859-1?Q?Montre=E1l?=', '=?iso-8859-1?Q?Minist=E8re?=', ); for (@encoded) { say scalar decode_mimewords($_); }
      I wanted :

      Communique Telefonica Montreal Ministere

      instead of :

      Communiqué Telefónica Montreál Ministère
      That is the problem I am trying to solve - I want the output to be in the "ASCII" character set.

        You could transliterate into ASCII in a second step, e.g. using Text::Unidecode:

        #!/usr/bin/perl use strict; use warnings; use MIME::Words qw(:all); use Text::Unidecode; my @encoded = ( '=?iso-8859-1?Q?Communiqu=E9?=', '=?iso-8859-1?Q?Telef=F3nica?=', '=?ISO-8859-1?Q?Montre=E1l?=', '=?iso-8859-1?Q?Minist=E8re?=', ); for (@encoded) { my $s = decode_mimewords($_); print unidecode($s), "\n"; } __END__ Communique Telefonica Montreal Ministere
Re: Transcoding MIME Strings
by PoorLuzer (Beadle) on Oct 10, 2009 at 02:12 UTC
    any insights into the second question of the timezone issue?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://800354]
Approved by zwon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-04-19 10:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found