Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

utf8 in perl

by theravadamonk (Scribe)
on Jul 04, 2018 at 04:55 UTC ( [id://1217851]=perlquestion: print w/replies, xml ) Need Help??

theravadamonk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Perl Monks,

I have a maillog file which contains Subject and From addresses with utf8.

Here's a bit of the maillog. Pls pay attention to Subject: and From:

mail_id: 3gTkbdw2YlQK, b: pru-s06Xc, Hits: 8.111, size: 40657, Subject +: "Heritage The Latest & Most Luxurious Banquet Hall In Anuradhapura +(raw: =?utf-8?Q?Heritage=20The=20Latest=20&=20Most=20Luxurious=20Banq +uet=20Hall=20In=20Anuradhapura?=)", From: Heritage_Hotel_Anuradhapura +_<ads@adzinair.net>_(raw:_=?utf-8?Q?Heritage=20Hotel=20Anuradhapura?= +_<ads@adzinair.net>), X-Mailer: MailChimp_Mailer_-_**CID8fa1f72d0e77e +89b5794**

I want to get it displayed in proper way. I still can't it. It STILL gives raw: =?utf-8?Q? etc..

I have added below lines to my perl code.

#!/usr/bin/perl use CGI ':standard'; use strict; use warnings; use CGI::Carp 'fatalsToBrowser'; # use only for testing print "Content-Type: text/html; charset=utf-8\n\n"; open FILE, '<:encoding(utf8)', '/var/log/mail.log' or die $!; my @file = reverse <FILE>;

Anyway, I am reading the file in Revers Order. where hv I gone wrong?

I expect your answers. I hv been trying since yesterday evening. But, I still haven't succeeded.

I googled a lot. I came across some. I hope this url is useful.

https://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

2018-07-11 Athanasius added code tags to maillog quote and linkified the Wikibooks link

Replies are listed 'Best First'.
Re: utf8 in perl
by haj (Vicar) on Jul 04, 2018 at 05:36 UTC

      Hi, Thanks for directing me to good source. Anyway, I wrote a simple perl code. here' my code.

      #!/usr/bin/perl use CGI ':standard'; use strict; use warnings; use CGI::Carp 'fatalsToBrowser'; # use only for testing use Encode qw(encode decode); no warnings 'utf8'; print "Content-Type: text/html; charset=utf-8\n\n"; my $subject = "Room Rush \303\242\302\200\302\223 Enjoy 25% off on you +r stay. (raw: Room Rush =?utf-8?b?4oCT?= Enjoy 25% off on your stay.) +"; $subject =~ s/[^[:ascii:]]+//g; # get rid of non-ASCII characters my $subject_decoded = decode("MIME-Header", $subject); #my $subject_decoded = decode("MIME-B", $subject); #my $subject_decoded = decode("MIME-Q", $subject); print "\n"; print "<br/>"; print "subject: $subject \n\n"; print "<br/>"; print "subject_decoded: $subject_decoded \n\n";

      here's what I get via web browser.

      subject: Room Rush Enjoy 25% off on your stay. (raw: Room Rush =?utf-8?b?4oCT?= Enjoy 25% off on your stay.) subject_decoded: Room Rush Enjoy 25% off on your stay. (raw: Room Rush – Enjoy 25% off on your stay.)

      But, the word " raw: " still appears?

      Is this code good? How to enhance it? Anyway, I spent many hours to write the code since I still learn perl

        I'm not quite sure how you are getting that output. Here's an SSCCE which you might be able to tailor to your requirements. I've removed all the CGI so you can just run this from the command line.

        #!/usr/bin/env perl use strict; use warnings; use Encode qw(encode decode); my $subject = "Room Rush \303\242\302\200\302\223 Enjoy 25% off on you +r stay. (raw: Room Rush =?utf-8?b?4oCT?= Enjoy 25% off on your stay.) +"; my $decoded = decode ("MIME-Header", $subject); print encode ("UTF-8", $decoded) . "\n";

        Running this gives:

        Room Rush â Enjoy 25% off on your stay. (raw: Room Rush – Enjoy 25% off on your stay.)
        

        which demonstrates that we have successfully decoded the raw part of the header (and encoded it to UTF-8 for output). HTH.

        Maybe I misunderstand you, but raw: is in the original input string too.

Technical note (Re: utf8 in perl)
by Dallaylaen (Chaplain) on Jul 09, 2018 at 07:41 UTC
    Could you please wrap the e-mail header example into <code>...</code> so that looooong lines are wrapped? As of current, it breaks formatting on the front page. Thank you.
Re: utf8 in perl
by Anonymous Monk on Jul 04, 2018 at 13:01 UTC
    Bear in mind also that whatever you are using to display the data must also be aware that it is being given UTF data. Some software is prepared to handle it; some turns it into an ASCII display-string; some display garbage. PerlMonks correctly displays symbols such as © Copyright because the browser knows to expect UTF.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1217851]
Approved by vinoth.ree
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-23 22:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found