Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

How to do parallel processing within mod_perl

by Anonymous Monk
on Nov 12, 2007 at 13:30 UTC ( [id://650263]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I am seeking your wisdom. I have a mod_perl application that searches some databases (custom socket connections, no DBI). The code looks like this:
sub metasearch { my $self = shift; my @dbs = @{$self->dbs}; my @resultset; foreach my $db (@dbs) { my $result = $db->do_query; push @resultset, $result; } return \@resultset; }

Now I would like the do the "foreach" part in parallel and don't know how.

I had it working fine in a stand alone test script with Parallel::ForkManager but that gave an error when run under mod_perl ("ModPerl::Util::exit: (120000) exit was called at /usr/local/share/perl/5.8.8/Parallel/ForkManager.pm line 306").

Then I read the threading tutorial and was more puzzled than before. Also in Google I couldn't find an example with iteration (I don't know in advance how many and which databases will be needed) and collection of the results. It would also be nice to have a timeout, just in case one of the databases is very slow or even dead.

Any ideas how to (best) do this?

Many thanks
-Michael

Replies are listed 'Best First'.
Re: How to do parallel processing within mod_perl
by jbert (Priest) on Nov 12, 2007 at 14:05 UTC
    An alternative (since you are working over your own socket layer) would be to use non-blocking sockets.

    You could then dispatch your request to each db and select on your socket connections, collecting replies as they come back.

    Timeouts fall out of this fairly easily too, but you'll need to be careful if you subsequently re-use a socket and get a response to a timed-out query.

    There are various wrappers around this. You can use IO::Socket and IO::Select as a starting point, but there are probably higher-level modules. Anyone else got any ideas?

    So something like this:

    sub metasearch { my $self = shift; # Note: could make dbs return array in array context # with 'wantarray' my @dbs = @{$self->dbs}; my @db_socks; foreach my $db (@dbs) { my $sock = $db->connect; send_nonblocking_query($sock); push @db_socks, $sock; } my $timeout_at = time() + $timeout_secs; while (@db_socks && time() < $timeout_at) { ..select on @db_socks, timing out at timeout_at ..handle a wakeup on a socket by reading the response } # error handling of remaining @db_socks }
Re: How to do parallel processing within mod_perl
by snowhare (Friar) on Nov 12, 2007 at 14:20 UTC
    You are severely constrained by the fact you are running under mod_perl. Your best option is probably to setup a seperate daemon that does your parallel queries for you and returns the result to your mod_perl process via a socket.

      Your separate daemon could very well be just another mod_perl process.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: How to do parallel processing within mod_perl
by perlfan (Vicar) on Nov 12, 2007 at 17:32 UTC
    This sounds like a case for POE.
Re: How to do parallel processing within mod_perl
by perrin (Chancellor) on Nov 12, 2007 at 18:50 UTC
    Your forking approach should work. You can read about how to do forking safely from mod_perl in the mod_perl docs.
Re: How to do parallel processing within mod_perl
by Anonymous Monk on Nov 13, 2007 at 09:02 UTC

    Thanks for all the suggestions but does anyone have any pointers either to docs or better yet sample code?

    Search for "mod_perl" and "fork" or "threads" tend to give lots of hits about apache MPMs but very little about fork/threads within mod_perl

    POE looks very interesting but also very much like a whole new framework to learn. It is on my todo list but I had hoped for something I could use after reading a man page or two but not hundreds of them ;-)

        Thanks for the pointer, I tried to follow the example as close as possible and came up with this (not proberly working) code:
        sub metasearch { my $self = shift; my $db_ref = $self->db_defs; my @dbs = @{$self->dbs}; my $logger = $self->logger; $SIG{CHLD} = 'IGNORE'; my @result; foreach my $db (@dbs) { defined (my $kid = fork) or die "Cannot fork: $!\n"; $logger->debug("Processing $db in process $kid"); if ($kid) { $logger->debug("Parent $$ has finished, kid's PID: $kid"); } else { # $r->cleanup_for_exec(); # untie the socket open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null +: $!"; open STDERR, '>/tmp/log' or die "Can't write to /tmp/log: + $!"; # setsid or die "Can't start a new session: $!"; my $oldfh = select STDERR; local $| = 1; select $oldfh; warn "started\n"; sleep 1, warn "$_\n" for 1..20; warn "completed\n"; push @result, "simulate result for $db in $kid"; CORE::exit(0); # terminate the process } $logger->debug("End of $db"); } $logger->debug(join(', ', @result)); }

        The difference is a foreach around the forks and the attempt to collect some result from the forked processes in @result but @result is empty.

        What am I doing wrong?

        As a side-note since I am not really interested in the warn-output: /tmp/log is empty after the run. Does this indicate that the child(s) didn't run?

        I left out the cleanup_for_exec and the setsid lines because from the explanation I gathered they are not necessary in my situation and I didn't know if cleanup_for_exec call is still the same in mod_perl2 (the example is for mod_perl1)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://650263]
Approved by moritz
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 14:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found