http://qs321.pair.com?node_id=25328
Category: Text Processing
Author/Contact Info Tim Lewis (LewisT@UAH.EDU)
Description: PINE is a common text-based email viewer on many UNIX systems. The PINE program stores email in large text files which makes it very handy to archive your old email... except that there's no table of contents at the beginning of the file to let you know what messages are stored there. This script solves that problem by parsing the PINE email store and creating a separate table of contents from the headers of each email. The resulting TOC lists the message number, title, sender info and date in formatted columns. I usually concatinate the TOC and email storage file, and then save the resulting file in my email archives.

Note: This script works very well with version 3.96 of PINE, which I use, but there are other versions that I have not tested it on.

PLEASE comment on this code. I'm a fairly new perl programmer and would appreciate feedback on how to improve my programming.
#!/usr/bin/perl

use warnings;
use strict;

if (!$ARGV[0]) {
    print "Usage: pinetoc inputfile outputfile\n";
    die;
}
open (INFILE, "<$ARGV[0]") or die "Could not open input file!\n";
open (OUTFILE, ">$ARGV[1]") or die "Could not open input file!\n";

##### Variables #####
my $From = "";        # Used to store the from address
my $Subject = "";        # Used to store the subject
my $Date = "";        # Used to store the message date
my $LetterNum = 0;        # Counts the number of emails
my $HeaderFlag = -1;        # Flag is < 0 when we're searching for a n
+ew email
            # Flag is > 0 and < 7 when we're getting header info
            # Flag is > 6 when we've found all the header info

##### Main Loop #####
while (<INFILE>){

    # Look for a new message (all messages have a header line beginnin
+g "X-UIDL: ")
    if (/^X-UIDL: \w{32}/) {

        if ($HeaderFlag > 0) {
            # We haven't got all the header info yet... but we'll writ
+e anyway
            &WriteTOCline ($LetterNum, $From, $Subject, $Date, $Header
+Flag);
        }

        $LetterNum++;

        # Clear the message data variables
        $HeaderFlag = 0;
        $From = "";
        $Subject = "";
        $Date = "";
    }
    if ($HeaderFlag < 0) {
        # Do nothing -- already found the header info, so we're search
+ing for a new letter
    }
    elsif ($HeaderFlag < 7) {
        if ($_ =~ "^From:") {
            s/(From: |"|(\[|<)[^\]>](\]|>)|\n)//g;    # remove a bunch
+a stuff to isolate the name
            s/^\s*|\s*$//g;                # remove leading or trailin
+g whitespace
            $From = $_;
            $HeaderFlag += 1;
        }
        elsif ($_ =~ "^Subject:") {
            s/Subject:|\n//g;            # remove stuff to isolate the
+ subject
            s/^\s*|\s*$//g;                # remove leading or trailin
+g whitespace
            $Subject = $_;
            if ($Subject eq "") {
                $Subject = "(Blank subject)";
            }
            $HeaderFlag += 2;
        }
        elsif ($_ =~ "^Date:") {
            ($Date) = ($_ =~ /Date: (\w+, \w+ \w+ \w+)/);
            $HeaderFlag += 4;
        }
    }
    else {
        # We've got all the header info
        &WriteTOCline ($LetterNum, $From, $Subject, $Date, $HeaderFlag
+);
        $HeaderFlag = -1;
    }
}

close INFILE;
close OUTFILE;
exit 0;

##### Subroutine for writing the TOC #####
sub WriteTOCline {
    my($LetterNum, $From, $Subject, $Date, $HeaderFlag) = @_;
    my @Error = ("","From", "Subject", "", "Date");

    my $ErrorNum = $HeaderFlag ^ 7;

    if ($ErrorNum > 7) {
        print "Error: Too much header info in letter $LetterNum titled
+ '$Subject'\n";
    }
    elsif ($ErrorNum >0) {
        print "Error: Missing '$Error[$ErrorNum]' field in message $Le
+tterNum\n";
    }
    
    # Write to output file (all cases)
    printf OUTFILE "%-4d  %-30.30s  %-20.20s  %-16.16s\n", $LetterNum,
+ $Subject, $From, $Date or die "Could not write to output file!\n";
}
Replies are listed 'Best First'.
(jjhorner)PineTOC
by jjhorner (Hermit) on Aug 01, 2000 at 04:35 UTC

    Pretty good code, from just a quick peek, but even though you are declaring your variables, you aren't checking up on yourself with the warnings and strict pragmas.

    Please use them. Even experienced Perl programmers use them.

    "-w" (or "use warnings") and "use strict" are your friends.

    cut-n-paste the following code and run it as your punishment.

    #!/usr/bin/perl -w use strict; my $i; for($i = 0; $i < 100; $i++) { print "I will use strict and warnings.\n"; };
    J. J. Horner
    Linux, Perl, Apache, Stronghold, Unix
    jhorner@knoxlug.org http://www.knoxlug.org/
    
      Thanks for your input!
      I updated my code, and ran my penance program like a good monk. =)
RE: PineTOC
by splinky (Hermit) on Aug 01, 2000 at 08:28 UTC
    First off, not a bad bit of code. I notice that you're checking the returns from your opens, which is a very good thing.

    I can't help but wonder why, in WriteTOCline, you take the two-step approach of sprintf followed by print instead of just using printf.

    And now, a few more Perlish ways to do a few things:

    You can shorten if ($_ =~ /^X-UIDL: \w{32}/) { to if (/^X-UIDL: \w{32}/) {. The $_ is implied on matches unless another variable is explicitly used.

    All instances of $Variable = $Variable + n can be shortened to $Variable += n with no loss of readability to anyone who knows Perl (or C, for that matter).

    Probably the biggest change you could make, and one which would be very educational for you, would be to read RFC 822, which defines the format of email messages, and use that knowledge to set $/ to something useful so that you could slurp up whole messages at a time instead of reading them one line at a time.

    Finally, I'll rain on your parade a bit by telling you that you're reinventing the wheel. If you want the semi-official Perl package for handling email, have a look at Graham Barr's MailTools bundle.

    *Woof*

      Thanks for your input. I updated my code based on your comments.

      I originally used the "sprintf" followed by "print" because I didn't know the "printf" command would take formatting. Thanks for pointing this out!

      I've read parts of RFC822 in the past, but I'm not sure how relavant it would be to this situation. PINE stores its messages with all the RFC headers, true.. but is the PINE message store totally 822 compliant? Maybe it is, but I assume that PINE probably changes the formatting of the messages and headers slightly when it stores them. Certainly, the messages in the store don't end in a single period on a line by itself (the signal for the end of an SMTP email). Still, I'm sure you're right in saying that there is a more efficient way to "slurp up whole messages".

      Thanks for the reference to MailTools. I'll take a look.

      Tally