Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

fork() and batch processing mail (was: Quicker array processing)

by Anonymous Monk
on Jun 27, 2002 at 17:05 UTC ( [id://177765]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I was thinking of a way to improve the speed of an e-mailing script where a bottleneck has arisen because of serial SMTP sending. I'd like to "parallelize" the processing of the address array into 2-4 parallel processes, each sending via SMTP their parts of the array. How would I split processing of the @email_addresses like this? create 2-4 sub-arrays? fork the sending once the sub-array's are populated? Basically I set the e-mail message var's once (body, subject, from) and then do a foreach loop on an array of e-mail addresses, sending the message sequentially. Looking fwd to some Monk-wisdome

edited: Fri Jun 28 00:17:41 2002 by jeffa - title change

Replies are listed 'Best First'.
Re: Quicker array processing
by kvale (Monsignor) on Jun 27, 2002 at 17:27 UTC
    It sounds like you are on the right track. I'd suggest the pre-forked server design as detailed in the Perl Cookbook. Basically, you fork off a process for each mail address until you have hit your desired max number of processes. Then wait for each child to finish and start another process at that time. Repeat until done. That will get the task done as quickly as possible, at some expense in space.

    From another program I wrote:
    my %children = (); # keys are current child process IDs my $children = 0; # current number of children # Install signal handlers for children. $SIG{CHLD} = \&REAPER; $SIG{INT} = \&HUNTSMAN; # create a set of children for (1..$max_connections) { my $group = shift @groups; bear_child( $group) if defined $group; last if @groups == 0; } # And maintain the population. MAINTAIN: while (@groups > 0) { sleep; # wait for a signal (i.e., child's death) for (my $i = $children; $i < $max_connections; $i++) { my $group = shift @groups; bear_child( $group); # top up the child pool last MAINTAIN if @groups == 0; } } # wait for the last of the children to complete scanning sleep while $children > 0; # reap any wayward children and exit cleanly HUNTSMAN(); sub REAPER { $SIG{CHLD} = \&REAPER; my $pid = wait; $children--; delete $children{$pid}; print "reaping child $pid\n"; } # handler for SIGINT and normal exit - kill all the children sub HUNTSMAN { local($SIG{CHLD}) = \&REAPER; # we're going to kill our children kill 'INT' => keys %children; exit; } # spawn subprocesses to search groups sub bear_child { my $group = shift; my $pid; my $sigset; # block signal for fork $sigset = POSIX::SigSet->new(SIGINT); sigprocmask( SIG_BLOCK, $sigset) or die "Can't block SIGINT for fork: $!\n"; # fork a subprocess die "fork: $!" unless defined ($pid = fork); if ($pid) { # Parent records the child's birth and returns. sigprocmask( SIG_UNBLOCK, $sigset) or die "Can't unblock SIGINT for fork: $!\n"; $children{$pid} = 1; $children++; return; } else { # Child process must *not* return from this subroutine. $SIG{INT} = 'DEFAULT'; # make SIGINT kill us as it did befo +re # unblock signals sigprocmask( SIG_UNBLOCK, $sigset) or die "Can't unblock SIGINT for fork: $!\n"; # open a connection to the news server undef $nntp; until ($nntp) { $nntp = Net::NNTP->new( $news_server, Timeout => 60); } $nntp->authinfo( $username, $password) if defined $username; print "child $$ searching $group\n"; # do assigned group process_group( $group); # close connection to news server $nntp->quit; print "child $$ is done processing $group\n"; exit; # make sure sub never returns } }
    Replace @groups wth @address, and the NNTP stuff with your own processing. Season to taste.

    -Mark
Re: Quicker array processing
by grep (Monsignor) on Jun 27, 2002 at 17:26 UTC
    Have you looked into POE. Although I have not used it (for anything except little tests) and have only seen a demonstration on it, it looks like a good fit for your problem. You can easily feed you array to several different handlers (that will fork off for you).

    The notes on the demo that I saw are at http://tilderoot.com/~tcaine/poe. This should at least give you an idea if you want to use POE



    grep
    Just me, the boy and these two monks, no questions asked.
Re: Quicker array processing
by gav^ (Curate) on Jun 27, 2002 at 18:28 UTC
    I do something similar using Mail::Bulkmail and Parallel::ForkManager, basically:
    • Create array of emails for each process you want
    • Create a child (using Parallel::ForkManager) to handle each array -- see this node for info
    • In each child give Mail::Bulkmail the list and the email you want to send off
    Hope this helps...

    gav^

Re: fork() and batch processing mail (was: Quicker array processing)
by flyboy (Novice) on Jun 29, 2002 at 08:36 UTC
    I just got done writing an application that does exactly what you want to do. I will give you this one bit of advice. Don't use a foreach loop, like:
    for my $email (@emails){
        #fork();
    }

    I suggest this:
    while (@emails){
        my $email = shift(@emails);
        #fork();
    }

    It will reduce memory usage. At least over time. I'm sure there is a better way, but it works for me.

    To handle the message array processing. I set up @mime_data, @text_data array's. Then in my send code something like this:
    for (@mime_data){
        s/PATTERN/$var/g;
        push(@foo,$_);
    }
    Net::SMTP->data("To: $email",@foo);
    So I obviously use Net::SMTP. That is the only module I use, I suggest you do the same unless you want to bloat your code - or you know more than I do. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://177765]
Front-paged by dimmesdale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-29 02:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found