Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Robust parsing of email messages

by downer (Monk)
on Mar 25, 2009 at 04:01 UTC ( [id://753002]=perlquestion: print w/replies, xml ) Need Help??

downer has asked for the wisdom of the Perl Monks concerning the following question:

working on a paper for CEAS on the trek 2007 data set. This data has ~50000 raw email files. From these files, I wish to extract the sender, the recipient, the subject, and the (parsed) text of the message body. I have tried Email::MIME, but crashes occur with "Illegal Content-Type parameter", i guess these correspond to non-mime emails? Anyway, I figure this is a fairly elementary task, and I probably havent been looking in the right places. Does anyone have a code snippet which accomplishes this task, or alternately, can anyone point me to a module(s) which does what i'm looking for?

Replies are listed 'Best First'.
Re: Robust parsing of email messages
by CountZero (Bishop) on Mar 25, 2009 at 06:41 UTC
    The few times I had to deal with e-mails, Email::Simple did me good service.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Robust parsing of email messages
by afoken (Chancellor) on Mar 25, 2009 at 08:32 UTC

    I use MIME::Parser, it works quite well even with malicious e-mails.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      This is actually quite a difficult task, much harder than i had anticipated. I have been playing with the following code which produces some weird behavior:
      !/usr/bin/perl #use Email::MIME; use Email::Simple; use Data::Dumper; use MIME::Parser; use HTML::SimpleParse; use strict; use warnings; undef $/; my $message = <>; my $email = Email::Simple->new($message); print $email->header("To"),"\t",$email->header('Content-Type'), "\n"; my $parser = new MIME::Parser; my $data = $parser->parse($email->body); my $results = $parser->results; print Dumper($results);
      I dont know how to handle messages cases where messages are mime/not mime. Plus this doesnt actually seem to "parse" the message. The MIME metadata still is still within the email. suggestions? Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://753002]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-20 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found