Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: How to get started with scraping my IMAP emails

by Corion (Patriarch)
on Mar 01, 2022 at 16:16 UTC ( [id://11141731]=note: print w/replies, xml ) Need Help??


in reply to Re: How to get started with scraping my IMAP emails
in thread How to get started with scraping my IMAP emails

In my (not really elegant, not really recommended) approaches, I recursively descend down the MIME message tree and usually output the Content-Type headers, to get a first view of the mail structure:

sub dump_parts($msg, $level=0) { print " " x $level, $msg->content_type, "\n"; for my $part ($msg->parts) { dump_parts($part, $level+1); } } dump_parts( $entity );

Then, I usually modify dump_parts to actually handle the content types (and other criteria) of the parts I'm interested in.

This discussion has given me the idea that maybe having an SQL, XPath or CSS-like query language for the parts could improve things, but so far, I haven't come up with a good enough concept to implement this.

Replies are listed 'Best First'.
Re^3: How to get started with scraping my IMAP emails
by bliako (Monsignor) on Mar 01, 2022 at 20:05 UTC

    Ouch! can you trust all those email apps to map the same content to the same content-mime-type consistently?

    In the meantime I went back to Email::MIME and had good results (for my one multipart test email) with its walk_parts().

    my $client = Mail::IMAPClient->new(...); # ... search mail box my $parsed = Email::MIME->new($client->message_string($msgid)) +; my @parts_to_save; $parsed->walk_parts(sub { push @parts_to_save, $_[0] }); # the [0] is the whole message, rest are all parts including n +ested for (@parts_to_save){ print $_->as_string }

    Email::MIME has also a t/nested-parts.t which I used to check that it works fine for nested parts.

    And it seems I am leaving the dreadfull world of email.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2024-04-23 07:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found