Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Parsing a Unix Mbox

by Anonymous Monk
on Jul 16, 2002 at 18:53 UTC ( [id://182199]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to read each message in a Unix Mbox and extract from each message the heading information and the message itself. I need to do edit checks on each of the above. Once I'm done, I need to move the message to an output directory and delete the message from the mailbox. I'm new to Perl. Do you recommend using a Perl module,i.e., Mail::Box? Need some guidance.

Replies are listed 'Best First'.
Re: Parsing a Unix Mbox
by Corion (Patriarch) on Jul 16, 2002 at 18:59 UTC

    Yes. I really recommend Mail::Box - the documentation has some examples that should get you started, and the mail handling is painless in my opinion. Installation should be straightforward for you if you have superuser permissions on your system, otherwise a local installation of the modules will work as well.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      My systems admin installed Mail::Box version 1.324; couldn't install Mail::Box version 2.015 which needs Perl 5.6. I ran the two examples in version 1.324. I need some help with using these modules. Which module do I use to extract the body of the message. Can you show me a sample script to print the body of the message. I ran the example script that loops through each message in the mbox to print the headings. I'm new to perl and perl modules. Is this task difficult to accomplish without using a module? If so could you briefly explain it to me so that I can explain it to my boss. Thanks.

        Yes, this task is very difficult to do right without a module, and I won't explain it here, as explaining it means reformulating the mbox manpage (found here for example) and Mail::Box handles it very nicely. I have only used Mail::Box after version 2, but I guess that the basic methods haven't changed that much since :

        #!/usr/bin/perl -w # Some vestige of local delivery # For another method, have a look at Mail::LocalDelivery use strict; use Mail::Box; use Mail::Box::Manager; use Mail::Message; use Mail::Message::Construct; use vars qw($localmailbase $foldername); use vars qw($mgr $folder); $host = 'hera.informatik.uni-frankfurt.de'; $localmailbase = "/home/corion/mail/"; $foldername = "informatik"; $mgr = Mail::Box::Manager->new(folderdir => $localmailbase, default_folder_type => 'mbox', ); $folder = $mgr->open( folder => $foldername, access => 'rw', create => + 1 ); die "Couldn't open mailfolder '$foldername' : $!\n" unless $folder; print "Using folder ",$folder->name,"\n"; my %messageIDs; %messageIDs = map { $_->get("Message-ID"), $_ } ($folder->messages); my $msg; foreach $msg ($folder->messages) { print "*** The content of this message is :\n"; print $msg->body; } $mgr->close($folder);

        I don't have access to any older Mail::Box documentation at the moment, but each message should have the body method, which returns the body of the message as a string.

        perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Parsing a Unix Mbox
by Cine (Friar) on Jul 16, 2002 at 22:22 UTC
    use MIME::Parser; use Mail::Util qw( read_mbox ); my @messages = read_mbox('filename'); my @parsed_messages = map {(new MIME::Parser)->parse($$_)} @messages;


    T I M T O W T D I
      Can you please tell me why the above doesnt work with perl 5.18?.. I am trying to understand how to resolve error on the last line: Not a SCALAR reference at ..
        After RTFM , it looks both modules are deprecated and broken.. read_mbox: This method does not quote lines which accidently also start with the message seperater From: parse: DEPRECATED A ref to array of scalrs
Re: Parsing a Unix Mbox
by rah (Monk) on Jul 17, 2002 at 01:15 UTC
    I've had pretty good success with Mail::Procmail. I don't know if your familiar with the procmail program, but it provides a simple, rules based means of parsing mail. You can build rules based on header lines, like to/from, you can also look for matches in the body or subject line. based on your rules, you can file, forward, reply, etc. or strip off the message body to a seperate text file (sounds like this is the capability you're looking for). I have it running in a couple of production applications and it has performed flawlessly. Fair warning though, some people absolutely detest procmail rules - grammar/syntax. In fact the mere mention of the word procmail can start 'religious' wars.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://182199]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2024-04-24 04:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found