in reply to Re: How to get started with scraping my IMAP emails
in thread How to get started with scraping my IMAP emails

In my (not really elegant, not really recommended) approaches, I recursively descend down the MIME message tree and usually output the Content-Type headers, to get a first view of the mail structure:

sub dump_parts($msg, $level=0) { print " " x $level, $msg->content_type, "\n"; for my $part ($msg->parts) { dump_parts($part, $level+1); } } dump_parts( $entity );

Then, I usually modify dump_parts to actually handle the content types (and other criteria) of the parts I'm interested in.

This discussion has given me the idea that maybe having an SQL, XPath or CSS-like query language for the parts could improve things, but so far, I haven't come up with a good enough concept to implement this.

Replies are listed 'Best First'.
Re^3: How to get started with scraping my IMAP emails
by bliako (Monsignor) on Mar 01, 2022 at 20:05 UTC

    Ouch! can you trust all those email apps to map the same content to the same content-mime-type consistently?

    In the meantime I went back to Email::MIME and had good results (for my one multipart test email) with its walk_parts().

    my $client = Mail::IMAPClient->new(...); # ... search mail box my $parsed = Email::MIME->new($client->message_string($msgid)) +; my @parts_to_save; $parsed->walk_parts(sub { push @parts_to_save, $_[0] }); # the [0] is the whole message, rest are all parts including n +ested for (@parts_to_save){ print $_->as_string }

    Email::MIME has also a t/nested-parts.t which I used to check that it works fine for nested parts.

    And it seems I am leaving the dreadfull world of email.