Re: How to get started with scraping my IMAP emails

Thank you all for your insights and shared code. I have not replied since because I am still struggling with this. I have solved the first part: fetching a message from server, thanks to your input. I do something like this:

use Mail::IMAPClient;
use Email::MIME;
use Data::Dumper;

my $imap = Mail::IMAPClient->new(
  Server   => 'abc.com',
  User     => 'xxx',
  Password => 'xxx',
  Ssl      => 1,
  Uid      => 1,
#  Starttls => 1,
);
die "failed to instantiate." unless defined $imap;

$imap->connect or die "Could not connect: $@\n";

my $folders = $imap->folders
  or die "List folders error: ", $imap->LastError, "\n";
print "Folders: @$folders\n";

$imap->select( 'INBOX' )
  or die "Select 'INBOX' error: ", $imap->LastError, "\n";

my $list = $client->search('SUBJECT', 'a new email');
for my $msgid (@$list){
    my $from  = $client->get_header( $msgid, "From" );
    my $subj  = $client->get_header( $msgid, "Subject" );
        my $bsdat = $client->fetch( $msgid, "bodystructure" );
        my $bss   = $client->body_string($msgid);
        my $parser = MIME::Parser->new();
        $parser->output_to_core(0);
        # this saves message IN ONE BIG FILE, text+attachments togethe
+r!!!
        # and the extension is '.txt'!!!!
        $parser->extract_nested_messages(1);
        $parser->output_under('./out');
        my $entity = $parser->parse_data($bss);
        # $entity->parts does not give me the parts 
        # even if message is 'Content-type: MULTIPART/mixed'
}
[download]

I am still struggling with the 2nd part: unwrap a message to local disk, each attachment on its own file. And I am looking for a way to do that seemingly simple and solved-by-now problem either by MIME::Parser or some other package. Alas the prospects look bleak.

The above was put together with code from NERDVANA, Discipulus, talexb !

p.s. edit: my bandwidth is very limited so in order to test this I have setup a minimal mail server (dovecot) in my linux box without the ability to smtp or ssl (to keep things simple). I have used thunderbird in order to copy my multipart test email from a "real" email account's INBOX to the localhost dummy (using 'copy to' in thunderbird) and now I can do the testing without using the net or bothering my MailSP. Of course I could have just saved the email into a file and read from there ...

Comment on Re: How to get started with scraping my IMAP emails Download Code

Replies are listed 'Best First'.
Re^2: How to get started with scraping my IMAP emails by Corion (Patriarch) on Mar 01, 2022 at 16:16 UTC
In my (not really elegant, not really recommended) approaches, I recursively descend down the MIME message tree and usually output the `Content-Type` headers, to get a first view of the mail structure: `sub dump_parts($msg, $level=0) { print " " x $level, $msg->content_type, "\n"; for my $part ($msg->parts) { dump_parts($part, $level+1); } } dump_parts( $entity );` [download] Then, I usually modify `dump_parts` to actually handle the content types (and other criteria) of the parts I'm interested in. This discussion has given me the idea that maybe having an SQL, XPath or CSS-like query language for the parts could improve things, but so far, I haven't come up with a good enough concept to implement this.	[reply] [d/l] [select]
Re^3: How to get started with scraping my IMAP emails by bliako (Monsignor) on Mar 01, 2022 at 20:05 UTC
Ouch! can you trust all those email apps to map the same content to the same content-mime-type consistently? In the meantime I went back to Email::MIME and had good results (for my one multipart test email) with its walk_parts(). `my $client = Mail::IMAPClient->new(...); # ... search mail box my $parsed = Email::MIME->new($client->message_string($msgid)) +; my @parts_to_save; $parsed->walk_parts(sub { push @parts_to_save, $_[0] }); # the [0] is the whole message, rest are all parts including n +ested for (@parts_to_save){ print $_->as_string }` [download] Email::MIME has also a t/nested-parts.t which I used to check that it works fine for nested parts. And it seems I am leaving the dreadfull world of email.	[reply] [d/l]


Your skill will accomplish what the force of many cannot
	PerlMonks