Parsing an email

BernieC has asked for the wisdom of the Perl Monks concerning the following question:

I"m lost in a bunch of twisty passages all in the Email:: world :o). my problem is {I think} very simple but the Email modules seem much more focused on creating/modifying messages and I can't see how just to *examine* a message.

What I want to do a mock-email-reader. This means I need to parse out the headers {I just need things like from/to/subject/date} and then find the "body" of the message. There seem to be three types of incoming emails one is plain text, another in plain html {that is no multipart but just HTML.. I got one just today:

X-CMAE-Envelope: MS4xfLUIIc3gwFFCUTu1+RYnII5snX2pyaUrABakvIQ567LlL7RBF
+Ly4Wo65N93eCIInGj50aDn6TLwhXwJbk7HKUHu2pUzH8OWeKTJoF2xE/w3tkTQrR8cj
 Kh4gBf/TMflzvBVgeRGN7++n/ZIwr/endxydKhxB1KRKrAoSBcA1O3+KsH4dy7QKym+yU
+9SP+8B9fQ==
X-PMFLAGS: 34095744 0 65537 PQVHWQ2O.CNM                    
X-CC-Diagnostic: Body contains "click here" (20)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/
+/www.=
w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:v=3D"u
+rn:sc=
hemas-microsoft-com:vml">=20
 <head> <!--[if gte mso 9]><xml><o:OfficeDocumentSettings><o:AllowPNG/
+><o:P=
ixelsPerInch>96</o:PixelsPerInch></o:OfficeDocumentSettings></xml><![e
+ndif]=
-->=20
[download]

and of course a multipart message {in which case I'd want the HTML part}.

This feels like it should be easy but so much of Email::* is occupied about modifying/adding/MIMEing, etc that I can't separate out the simple "parse and extract" machinery I need. Any advice/guidance/tutorial? THANKS

Comment on Parsing an email Download Code

Replies are listed 'Best First'.
Re: Parsing an email by kcott (Archbishop) on Jan 09, 2022 at 08:25 UTC
G'day BernieC, Take a look at Courriel. I haven't used it myself; however, its documentation indicates it has straightforward methods that return all the things you want (`from()`, `to()`, `subject()`, `plain_body_part()`, `html_body_part()`, and so on). It's been around for over a decade with many updates (see Changes) over the years, the last being just a few months ago. Its author, Dave Rolsky, is a well-known and respected CPAN contributor. — Ken	[reply] [d/l] [select]
Re^2: Parsing an email by BernieC (Pilgrim) on Jan 09, 2022 at 19:34 UTC
Mail::Box looks like it'll work but Courriel looks like exactly what I wanted: "This class exists to provide a high level API for working with emails, particular for processing incoming email." Thanks!!	[reply]
Re^3: Parsing an email by BernieC (Pilgrim) on Jan 09, 2022 at 23:29 UTC
Dumb question -- I did a "cpan i Courriel" and it seems to have installed without its documentation: `D:\>perldoc Courriel No documentation found for "Courriel".` [download] I've downloaded the tar.gz for it and I've pawed through it and I can't see anything that looks like a .pod or .1 file in it.. so I dunno what to do next.	[reply] [d/l]
Re^4: Parsing an email by kcott (Archbishop) on Jan 10, 2022 at 01:51 UTC
Re^4: Parsing an email by Fletch (Bishop) on Jan 10, 2022 at 01:57 UTC
Re^5: Parsing an email by BernieC (Pilgrim) on Jan 10, 2022 at 19:26 UTC
Re^5: Parsing an email by BernieC (Pilgrim) on Jan 10, 2022 at 19:53 UTC
Some notes below your chosen depth have not been shown here
Re: Parsing an email by Fletch (Bishop) on Jan 09, 2022 at 04:13 UTC
I'd used Mail::Box working with a maildir folder or three (pulled down with offlineimap) moving things around based on headers. Might do what you need. The cake is a lie. The cake is a lie. The cake is a lie.	[reply]


Problems? Is your data what you think it is?
	PerlMonks