Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

parsing mime emails (revised!)

by downer (Monk)
on Mar 25, 2009 at 23:07 UTC ( [id://753285]=perlquestion: print w/replies, xml ) Need Help??

downer has asked for the wisdom of the Perl Monks concerning the following question:

I have been able to transform a mime message into a mime entity and get what i want from the header. The problem is, what do i now do with the one or more parts which result? I am trying to get just the raw terms from an email. How do i take a set or mime parts and combine just their contents from the entity structure? here is a sample email file which i am inputting via stdin:
From armoraareoo@t-dialin.net Sun Apr 8 16:11:45 2007 Return-Path: <armoraareoo@t-dialin.net> Received: from plg2.math.uwaterloo.ca (plg2.math.uwaterloo.ca [129.97. +186.80]) by speedy.uwaterloo.ca (8.12.8/8.12.5) with ESMTP id l38KBj0I00482 +7 for <theplg@speedy.uwaterloo.ca>; Sun, 8 Apr 2007 16:11:45 -0400 Received: from t-dialin.net (p508ee6ed.dip.t-dialin.net [80.142.230.23 +7]) by plg2.math.uwaterloo.ca (8.13.8/8.13.8) with SMTP id l38KAt7e009 +862; Sun, 8 Apr 2007 16:11:01 -0400 (EDT) Message-ID: <2fee01c779eb$fb400220$c15f4e5d@armoraareoo> From: "Drew" <armoraareoo@t-dialin.net> To: "Lynsey Harvey" <dmason@plg2.math.uwaterloo.ca> Cc: "Dorcas" <migod@plg2.math.uwaterloo.ca>, "Misty" <holt@plg2.math.uwaterloo.ca>, "Rosalia" <dsvetinovic@plg2.math.uwaterloo.ca>, "Bart Shaw" <y5guo@plg2.math.uwaterloo.ca>, "Alexia Myers" <the00@plg2.math.uwaterloo.ca>, "Lona Gomez" <adtrevors@plg2.math.uwaterloo.ca>, "Caridad Sims" <elterra@plg2.math.uwaterloo.ca> Subject: How r u lately Date: Sun, 08 Apr 2007 14:41:24 -0500 MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB" X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2462.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2462.0000 X-Miltered: at mailchk-m02 with ID 46194C50.000 by Joe's j-chkmail (ht +tp://j-chkmail.ensmp.fr)! X-Virus-Scanned: ClamAV version 0.90.1, clamav-milter version 0.90.1 o +n localhost X-Virus-Status: Clean X-UUID: 3e328b2a-cdb4-49f8-94ce-feeb89b85d5d Status: O Content-Length: 21559 Lines: 322 This is a multi-part message in MIME format. ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB Content-Type: multipart/alternative; boundary="----=_NextPart_CA0_4C28_95CE35A4.E636E095" ------=_NextPart_CA0_4C28_95CE35A4.E636E095 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable part one of the document ------=_NextPart_CA0_4C28_95CE35A4.E636E095 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable same document... ------=_NextPart_CA0_4C28_95CE35A4.E636E095-- ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB Content-Type: image/gif; name="sumorg.gif" Content-Transfer-Encoding: base64 Content-ID: <5627001c779eb7fbaa0e902503734a@armoraareoo> image stuff... ------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB--
now how can i ignore the image part, find the nested subparts? I have tried with the flag:  $parser->parse_nested_messages(1); but this does't seem to do anything when i issue  $entity->dump_skeleton; to check the layout of the parts. Here is my code to get the entity:
#!/usr/bin/perl use Email::AddressParser; use Data::Dumper; use MIME::Parser; use strict; use warnings; undef $/; my $message = <>; my $parser = MIME::Parser->new; $parser->tmp_to_core(1); $parser->parse_nested_messages(1); my $entity = $parser->parse_data($message); $entity->dump_skeleton; my $head = $entity->head; my $subject = $head->get('Subject',0); if($subject =~ /\n/) { chop($subject); } my $to = $head->get('To', 0); if($to =~ /\n/) { chop($to); } my @addresses = Email::AddressParser->parse($to); $to = $addresses[0]->address if(@addresses); my $num_parts = $entity->parts; print "$subject\t$to\t$num_parts\n"; $entity->purge;

Replies are listed 'Best First'.
Re: parsing mime emails
by moritz (Cardinal) on Mar 25, 2009 at 23:19 UTC
    MIME::Parser: can't flush: No space left on device

    So it tries to write on a partition that's full. On a unixish system you can find out which one is full by running df.

Re: parsing mime emails (revised!)
by ig (Vicar) on Mar 28, 2009 at 18:35 UTC
    now how can i ignore the image part, find the nested subparts?

    The following will deal with all the parts of an entity, skipping any parts that are mime type 'image/gif':

    foreach my $part ($entity->parts_DFS) { my $type = $part->mime_type; next if($type eq 'image/gif'); print "Here's a part: $type\n"; # do what you want with the part here }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://753285]
Approved by zwon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-18 07:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found