Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Parsing an email

by BernieC (Pilgrim)
on Jan 08, 2022 at 20:02 UTC ( [id://11140276]=perlquestion: print w/replies, xml ) Need Help??

BernieC has asked for the wisdom of the Perl Monks concerning the following question:

I"m lost in a bunch of twisty passages all in the Email:: world :o). my problem is {I think} very simple but the Email modules seem much more focused on creating/modifying messages and I can't see how just to *examine* a message.

What I want to do a mock-email-reader. This means I need to parse out the headers {I just need things like from/to/subject/date} and then find the "body" of the message. There seem to be three types of incoming emails one is plain text, another in plain html {that is no multipart but just HTML.. I got one just today:

X-CMAE-Envelope: MS4xfLUIIc3gwFFCUTu1+RYnII5snX2pyaUrABakvIQ567LlL7RBF +Ly4Wo65N93eCIInGj50aDn6TLwhXwJbk7HKUHu2pUzH8OWeKTJoF2xE/w3tkTQrR8cj Kh4gBf/TMflzvBVgeRGN7++n/ZIwr/endxydKhxB1KRKrAoSBcA1O3+KsH4dy7QKym+yU +9SP+8B9fQ== X-PMFLAGS: 34095744 0 65537 PQVHWQ2O.CNM X-CC-Diagnostic: Body contains "click here" (20) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/ +/www.= w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:v=3D"u +rn:sc= hemas-microsoft-com:vml">=20 <head> <!--[if gte mso 9]><xml><o:OfficeDocumentSettings><o:AllowPNG/ +><o:P= ixelsPerInch>96</o:PixelsPerInch></o:OfficeDocumentSettings></xml><![e +ndif]= -->=20
and of course a multipart message {in which case I'd want the HTML part}.

This feels like it should be easy but so much of Email::* is occupied about modifying/adding/MIMEing, etc that I can't separate out the simple "parse and extract" machinery I need. Any advice/guidance/tutorial? THANKS

Replies are listed 'Best First'.
Re: Parsing an email
by kcott (Archbishop) on Jan 09, 2022 at 08:25 UTC

    G'day BernieC,

    Take a look at Courriel. I haven't used it myself; however, its documentation indicates it has straightforward methods that return all the things you want (from(), to(), subject(), plain_body_part(), html_body_part(), and so on).

    It's been around for over a decade with many updates (see Changes) over the years, the last being just a few months ago. Its author, Dave Rolsky, is a well-known and respected CPAN contributor.

    — Ken

      Mail::Box looks like it'll work but Courriel looks like exactly what I wanted: "This class exists to provide a high level API for working with emails, particular for processing incoming email." Thanks!!
        Dumb question -- I did a "cpan i Courriel" and it seems to have installed without its documentation:
        D:\>perldoc Courriel No documentation found for "Courriel".
        I've downloaded the tar.gz for it and I've pawed through it and I can't see anything that looks like a .pod or .1 file in it.. so I dunno what to do next.
Re: Parsing an email
by Fletch (Bishop) on Jan 09, 2022 at 04:13 UTC

    I'd used Mail::Box working with a maildir folder or three (pulled down with offlineimap) moving things around based on headers. Might do what you need.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11140276]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-24 07:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found