I have been able to transform a mime message into a mime entity and get what i want from the header. The problem is, what do i now do with the one or more parts which result? I am trying to get just the raw terms from an email. How do i take a set or mime parts and combine just their contents from the entity structure? here is a sample email file which i am inputting via stdin:
From armoraareoo@t-dialin.net Sun Apr 8 16:11:45 2007
Return-Path: <armoraareoo@t-dialin.net>
Received: from plg2.math.uwaterloo.ca (plg2.math.uwaterloo.ca [129.97.
+186.80])
by speedy.uwaterloo.ca (8.12.8/8.12.5) with ESMTP id l38KBj0I00482
+7
for <theplg@speedy.uwaterloo.ca>; Sun, 8 Apr 2007 16:11:45 -0400
Received: from t-dialin.net (p508ee6ed.dip.t-dialin.net [80.142.230.23
+7])
by plg2.math.uwaterloo.ca (8.13.8/8.13.8) with SMTP id l38KAt7e009
+862;
Sun, 8 Apr 2007 16:11:01 -0400 (EDT)
Message-ID: <2fee01c779eb$fb400220$c15f4e5d@armoraareoo>
From: "Drew" <armoraareoo@t-dialin.net>
To: "Lynsey Harvey" <dmason@plg2.math.uwaterloo.ca>
Cc: "Dorcas" <migod@plg2.math.uwaterloo.ca>,
"Misty" <holt@plg2.math.uwaterloo.ca>,
"Rosalia" <dsvetinovic@plg2.math.uwaterloo.ca>,
"Bart Shaw" <y5guo@plg2.math.uwaterloo.ca>,
"Alexia Myers" <the00@plg2.math.uwaterloo.ca>,
"Lona Gomez" <adtrevors@plg2.math.uwaterloo.ca>,
"Caridad Sims" <elterra@plg2.math.uwaterloo.ca>
Subject: How r u lately
Date: Sun, 08 Apr 2007 14:41:24 -0500
MIME-Version: 1.0
Content-Type: multipart/related;
type="multipart/alternative";
boundary="----=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB"
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2462.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2462.0000
X-Miltered: at mailchk-m02 with ID 46194C50.000 by Joe's j-chkmail (ht
+tp://j-chkmail.ensmp.fr)!
X-Virus-Scanned: ClamAV version 0.90.1, clamav-milter version 0.90.1 o
+n localhost
X-Virus-Status: Clean
X-UUID: 3e328b2a-cdb4-49f8-94ce-feeb89b85d5d
Status: O
Content-Length: 21559
Lines: 322
This is a multi-part message in MIME format.
------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB
Content-Type: multipart/alternative;
boundary="----=_NextPart_CA0_4C28_95CE35A4.E636E095"
------=_NextPart_CA0_4C28_95CE35A4.E636E095
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
part one of the document
------=_NextPart_CA0_4C28_95CE35A4.E636E095
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
same document...
------=_NextPart_CA0_4C28_95CE35A4.E636E095--
------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB
Content-Type: image/gif;
name="sumorg.gif"
Content-Transfer-Encoding: base64
Content-ID: <5627001c779eb7fbaa0e902503734a@armoraareoo>
image stuff...
------=_NextPart_DAF_AA5B_FF2BEB78.AF9733DB--
now how can i ignore the image part, find the nested subparts? I have tried with the flag:
$parser->parse_nested_messages(1); but this does't seem to do anything when i issue
$entity->dump_skeleton; to check the layout of the parts. Here is my code to get the entity:
#!/usr/bin/perl
use Email::AddressParser;
use Data::Dumper;
use MIME::Parser;
use strict;
use warnings;
undef $/;
my $message = <>;
my $parser = MIME::Parser->new;
$parser->tmp_to_core(1);
$parser->parse_nested_messages(1);
my $entity = $parser->parse_data($message);
$entity->dump_skeleton;
my $head = $entity->head;
my $subject = $head->get('Subject',0);
if($subject =~ /\n/)
{
chop($subject);
}
my $to = $head->get('To', 0);
if($to =~ /\n/)
{
chop($to);
}
my @addresses = Email::AddressParser->parse($to);
$to = $addresses[0]->address if(@addresses);
my $num_parts = $entity->parts;
print "$subject\t$to\t$num_parts\n";
$entity->purge;