http://qs321.pair.com?node_id=449873
Category: Win32 Stuff
Author/Contact Info corion-pm@corion.net
Description:

This script extracts a Lotus Notes database into separate HTML files. As I only use the Lotus Notes R4.5 client, there are many deficiencies:

  • The maximum message number is hardcoded
  • Embedded graphics get lost
  • Embedded formatting information gets lost

"Other than that", the information gets saved into an HTML::Template formatted file (see separate reply).


use strict;
use Win32::OLE qw(in);
use HTML::Template;

{
  package Wrapper::Notes::Template;
  use strict;
  use File::Spec;
  sub new {
    my ($class,$document, $attachmentdir) = @_;
    my $self = { document => $document, attachments => $attachmentdir 
+};
    bless $self, $class;
    $self;
  };
  sub document { $_[0]->{document} };
  sub param {
    my ($self,@args) = @_;
    if (scalar @args) {
      my $result;
      if ($_[1] eq 'Attachments') {
        my $result = [];

        my $body = $self->document->GetFirstItem('Body');
        my @attachments = grep { warn join ":",$_->{Name}, $_->{Type},
+$_->{Text}; $_->{Type} == 4 } (@{$self->document->Items()});
        mkdir $self->{attachments};
        for my $attname (@attachments) {
          my $url = File::Spec->catfile($self->{attachments},$attname)
+;
          $url = File::Spec->rel2abs($url);
          #warn "Extracting $attname to $url";
          my $f = $self->document->getAttachment($attname);
          if ($f) {
            $f->extractFile($url);
            push @$result, { name => $attname, url => $url };
          };
        };
        return $result;
      } elsif ($_[1] eq 'EmbeddedObjects') {
        my $result = [];

        my $body = $self->document->GetFirstItem('Body');
        my $attachments = $body->EmbeddedObjects;
        if ($attachments) {
            mkdir $self->{attachments};
            for my $att (Win32::OLE::in $attachments) {
              warn $att->{Type};
              my $url = File::Spec->catfile($self->{attachments},$att-
+>{Name});
              $url = File::Spec->rel2abs($url);
              $att->extractFile($url);
              push @$result, { name => $att->{Name}, url => $url };
            };
        };
        return $result;
      } else {
        $result = $self->document->{$_[1]};
      };
      if (ref $result) {
        return [ map { "value" => $_ }, @$result ];
      } else {
        $result;
      };
    } else {
      return (map { $_->Name } (Win32::OLE::in ($self->document->Items
+()))), "Attachments", "EmbeddedObjects";
    };
  };
};

my ($server,$database) = ('server','mail/corion.nsf');
my $Notes = Win32::OLE->new('Notes.NotesSession')
    or die "Cannot start Lotus Notes Session object.\n";
my ($Version) = ($Notes->{NotesVersion} =~ /\s*(.*\S)\s*$/);
print "The current user is $Notes->{UserName}.\n";
print "Running Notes \"$Version\" on \"$Notes->{Platform}\".\n";
my $Database = $Notes->GetDatabase($server, $database);
my $AllDocuments = $Database->AllDocuments;
my $Count = $AllDocuments->Count;
print "There are $Count documents in the database.\n";
my $Index = 4419;
while (++$Index <= $Count) {
    my $Document = $AllDocuments->GetNthDocument($Index);
    my $wrapper = Wrapper::Notes::Template->new($Document,sprintf "ema
+il/mail.%05g",$Index);
    my $template = HTML::Template->new(
                        filename => 'lotus-email.tmpl',
                        die_on_bad_params => 0,
                        loop_context_vars => 1,
                        associate => [ $wrapper ],
                        case_sensitive => 1,
                        );
    my $outfile = sprintf "email/mail.%05g.html", $Index;
    open MAIL, ">", $outfile or die "Couldn't create '$outfile' : $!\n
+";
    $template->output( print_to => *MAIL );
    close MAIL;
    last unless $Index <= 4420; # magic number!
}
Replies are listed 'Best First'.
Re: Extract Notes Mail to HTML
by Corion (Patriarch) on Apr 21, 2005 at 07:13 UTC

    This is the template that goes with the above snippet. The single Notes fields can be conveniently added to the template, if you want other fields than what I've provided.

    <html> <head> <title>Exported Lotus Notes email</title> <body> <TMPL_LOOP NAME="Subject"> <tt>Subject:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="Date"> <tt>Date:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="Folder"> <tt>Folder:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="From"> <tt>From:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="SendTo"> <tt>To:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="CopyTo"> <tt>Cc:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="BlindCopyTo"> <tt>Bcc:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <TMPL_LOOP NAME="Categories"> <tt>Categories:</tt> <TMPL_VAR NAME="value"><br /> </TMPL_LOOP> <hr /> <pre> <TMPL_VAR NAME="Body"> </pre> <TMPL_LOOP NAME="Attachments"> Attachment <br /> <tt>Name:</tt> <a href="file://<TMPL_VAR NAME="url">"><TMPL_VAR NAME +="name"></a><br /> </TMPL_LOOP> <TMPL_LOOP NAME="EmbeddedObjects"> Embedded <br /> <tt>Name:</tt> <a href="file://<TMPL_VAR NAME="url">"><TMPL_VAR NAME +="name"></a><br /> </TMPL_LOOP> </body> </head> </html>
Re: Extract Lotus Notes Mail to HTML
by diotalevi (Canon) on Apr 27, 2005 at 19:43 UTC

    You should be having the Domino server do the Domino->HTML conversion for you. You're going to throw away all the formatting when you use this method. At least change your title so it is obvious this is a brain-damaged implementation.

    Use ->GetFirstDocument/->GetNextDocument instead of ->GetNthDocument, always.

    You are using particularly ineffecient loop code. Instead of ->GetNthDocument(), use ->GetFirstDocument/->GetNextDocument. The former is going to actually perform an internal loop starting from 1 every time to get that nth numbered document. It is the most horribly inefficient method you could choose. So knock it off. The author of this function has publicly apologised for ever inflicting it upon the world. You'd only know this if you were reading the Lotus Notes development forums.

    my $doc = $AllDocuments->GetFirstDocument; while ( $doc ) { # Fetch the next document prior to running code to guard against s +omeone deciding to delete $doc. my $NextDoc = $AllDocuments->GetNextDocument( $doc ); ... $doc = $NextDoc; }

    A sample of what ->GetNthDocument does, in the nnotes.dll C code

    sub GetNthDocument { my ( $Collection, $n ) = @_; return undef if $n < 1; my $doc = $Collection->GetFirstDocument; my $ix = 1; while ( $ix < $n and $doc ) { $doc = $Collection->GetNextDocument( $doc ); ++ $ix; } return $doc; }

    Try not to lose formatting by smashing RichText to plain text

    Since you have the cooperation of a Domino server, you could have fetch ed the document and its formatting from the server using LWP or such. Consider this an outline for a future attempt at fetching this. I'm doing some of these property calls from memory so there's a good chance I'm slightly off here.

    sub Doc2HTML { my ( $doc ) = @_; my $db = $doc->{Parent}; my $filepath = $db->{FilePath}; my $server = $db->{Server}; # This is a bit of magic. I'm requesting the '0' view which will b +e whatever view in the db was designated "default". Also, all views c +an be used to fetch any document if you already know the document's U +NID. my $view = '0'; my $unid = $doc->{UniversalID}; return LWP::Simple::get( "http://$server/$filepath/$view/$unid" ); }

    $doc->{Body} or fetching of other RichTech items may yield truncated data

    In Domino, ordinary text values aren't allowed to get longer than 64K. I recall there's some issues with stuff being stored in a double-byte encoding so you get somewhere around a 32K effective limit. RichText items are allowed to hold up to 4 GB by contrast. For version 4 and 5, there is no way via Win32::OLE to work around this. Use the C or C++ API to extract non-truncated data from RichText values in that case. I once wrote a C++ program for dumping non-truncated text from RichText fields. I will post the source for this later.

    I will post the R6-ish way of doing this later.

Re: Extract Lotus Notes Mail to HTML
by Zero_Flop (Pilgrim) on Apr 22, 2005 at 07:05 UTC
    A couple comments:

    Why are you limiting the export to 4420? I don't know if this is a limit imposed by 4.5, but if email is limited to 4420 your $Count should always be lower.

    Also have you tried the "openmail" command to open the users mail instead of hardcoding it to your own db?

    Good Work

    ZeroFlop

      The number of mails to export is limited by 4420 because that was the number of mails in my mailstore at the time. I did not find a way to get the number of mails in a mailstore, so I punted and hardcoded the number.

      I don't know of the "openmail" command, but I needed to export email from two databases and the easiest way seemed to be to hardcode the names.