How to do parallel processing within mod

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I am seeking your wisdom. I have a mod_perl application that searches some databases (custom socket connections, no DBI). The code looks like this:

sub metasearch {
    my $self = shift;
    my @dbs = @{$self->dbs};

    my @resultset;    
    foreach my $db (@dbs) {
        my $result = $db->do_query;
        push @resultset, $result;
    }
    return \@resultset;
}
[download]

Now I would like the do the "foreach" part in parallel and don't know how.

I had it working fine in a stand alone test script with Parallel::ForkManager but that gave an error when run under mod_perl ("ModPerl::Util::exit: (120000) exit was called at /usr/local/share/perl/5.8.8/Parallel/ForkManager.pm line 306").

Then I read the threading tutorial and was more puzzled than before. Also in Google I couldn't find an example with iteration (I don't know in advance how many and which databases will be needed) and collection of the results. It would also be nice to have a timeout, just in case one of the databases is very slow or even dead.

Any ideas how to (best) do this?

Many thanks
-Michael

Comment on How to do parallel processing within mod_perl Download Code

Replies are listed 'Best First'.
Re: How to do parallel processing within mod_perl by jbert (Priest) on Nov 12, 2007 at 14:05 UTC
An alternative (since you are working over your own socket layer) would be to use non-blocking sockets. You could then dispatch your request to each db and `select` on your socket connections, collecting replies as they come back. Timeouts fall out of this fairly easily too, but you'll need to be careful if you subsequently re-use a socket and get a response to a timed-out query. There are various wrappers around this. You can use IO::Socket and IO::Select as a starting point, but there are probably higher-level modules. Anyone else got any ideas? So something like this: `sub metasearch { my $self = shift; # Note: could make dbs return array in array context # with 'wantarray' my @dbs = @{$self->dbs}; my @db_socks; foreach my $db (@dbs) { my $sock = $db->connect; send_nonblocking_query($sock); push @db_socks, $sock; } my $timeout_at = time() + $timeout_secs; while (@db_socks && time() < $timeout_at) { ..select on @db_socks, timing out at timeout_at ..handle a wakeup on a socket by reading the response } # error handling of remaining @db_socks }` [download]	[reply] [d/l] [select]
Re: How to do parallel processing within mod_perl by snowhare (Friar) on Nov 12, 2007 at 14:20 UTC
You are severely constrained by the fact you are running under mod_perl. Your best option is probably to setup a seperate daemon that does your parallel queries for you and returns the result to your mod_perl process via a socket.	[reply]
Re^2: How to do parallel processing within mod_perl by diotalevi (Canon) on Nov 12, 2007 at 16:16 UTC
Your separate daemon could very well be just another mod_perl process. ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply]
Re: How to do parallel processing within mod_perl by perlfan (Vicar) on Nov 12, 2007 at 17:32 UTC
This sounds like a case for POE.	[reply]
Re: How to do parallel processing within mod_perl by perrin (Chancellor) on Nov 12, 2007 at 18:50 UTC
Your forking approach should work. You can read about how to do forking safely from mod_perl in the mod_perl docs.	[reply]
Re: How to do parallel processing within mod_perl by Anonymous Monk on Nov 13, 2007 at 09:02 UTC
Thanks for all the suggestions but does anyone have any pointers either to docs or better yet sample code? Search for "mod_perl" and "fork" or "threads" tend to give lots of hits about apache MPMs but very little about fork/threads within mod_perl POE looks very interesting but also very much like a whole new framework to learn. It is on my todo list but I had hoped for something I could use after reading a man page or two but not hundreds of them ;-)	[reply]
Re^2: How to do parallel processing within mod_perl by perrin (Chancellor) on Nov 13, 2007 at 12:57 UTC
http://modperlbook.org/html/10-2-Forking-and-Executing-Subprocessesfrom-mod_perl.html	[reply]
Re^3: How to do parallel processing within mod_perl by Anonymous Monk on Nov 14, 2007 at 12:11 UTC
Thanks for the pointer, I tried to follow the example as close as possible and came up with this (not proberly working) code: sub metasearch { my $self = shift; my $db_ref = $self->db_defs; my @dbs = @{$self->dbs}; my $logger = $self->logger; $SIG{CHLD} = 'IGNORE'; my @result; foreach my $db (@dbs) { defined (my $kid = fork) or die "Cannot fork: $!\n"; $logger->debug("Processing $db in process $kid"); if ($kid) { $logger->debug("Parent $$ has finished, kid's PID: $kid"); } else { # $r->cleanup_for_exec(); # untie the socket open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null +: $!"; open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: + $!"; # setsid or die "Can't start a new session: $!"; my $oldfh = select STDERR; local $\| = 1; select $oldfh; warn "started\n"; sleep 1, warn "$_\n" for 1..20; warn "completed\n"; push @result, "simulate result for $db in $kid"; CORE::exit(0); # terminate the process } $logger->debug("End of $db"); } $logger->debug(join(', ', @result)); } [download] The difference is a foreach around the forks and the attempt to collect some result from the forked processes in @result but @result is empty. What am I doing wrong? As a side-note since I am not really interested in the warn-output: /tmp/log is empty after the run. Does this indicate that the child(s) didn't run? I left out the cleanup_for_exec and the setsid lines because from the explanation I gathered they are not necessary in my situation and I didn't know if cleanup_for_exec call is still the same in mod_perl2 (the example is for mod_perl1)	[reply] [d/l]
Re^4: How to do parallel processing within mod_perl by perrin (Chancellor) on Nov 14, 2007 at 13:51 UTC
Re^5: How to do parallel processing within mod_perl by Anonymous Monk on Nov 14, 2007 at 14:43 UTC
Some notes below your chosen depth have not been shown here


more useful options
	PerlMonks

How to do parallel processing within mod_perl