fixed set of forked processes

anonymized user 468275 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: fixed set of forked processes by derby (Abbot) on Dec 02, 2010 at 18:33 UTC
*nix box? Go with Parallel::ForkManager. -derby	[reply]
Re^2: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 02, 2010 at 18:39 UTC
Bingo! One world, one people	[reply]
Re: fixed set of forked processes by sundialsvc4 (Abbot) on Dec 02, 2010 at 18:25 UTC
I would approach this sort of problem by defining a fixed and configurable (small) number of threads, all of which are built to do the same thing: to read a work-request from a single queue (e.g. Thread::Queue::Duplex), perform the unit of work (in an `eval{}` block), and write a response-record to the same or to a different queue. All of the threads, no matter how many there are, are reading and writing from the same queues. So, when a record is written to “the request queue,” no one really cares which thread winds up picking-up the request and running it. The threads, in turn, are built to survive. Any runtime error that may occur during processing is absorbed, and a record of that event is merely added to the response-record for someone else down the line to deal with. To avoid too-much competition for the “single file,” you might dedicate one thread to the task of reading a block of records from the file and shoving them into the request queue. By some appropriate means, let the thread snooze until the number of enqueued items drops below some threshhold, at which time it reads a few more records from the file to recharge the queues. In this way, the jobs are indeed “processed in parallel,” but you maintain control over the attempted multiprogramming-level at all times. Such a system could perform work at a predictable and steady rate no matter how many jobs ultimately needed to be run. The size of that file would not affect the rate at which work was carried out; only the amount of wall-time required to do it.
Re^2: fixed set of forked processes by BrowserUk (Patriarch) on Dec 02, 2010 at 20:13 UTC
Please don't suggest the use of Thread::Queue::Duplex until you've used it, and therefore encountered its limitations. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 02, 2010 at 18:33 UTC
hmm although I prefer fork to thread so as to surivive later versions of perl, this does give me an idea of how to implement my own threads using fork in a way that overcomes my filehandle problem with the standard drop-in solution:- Since I know in advance I am going to use the max configured subprocesses given that there are 150000 jobs in the queue being rapidly thrown at my scheduling architecture, I could start by forking precisely that number of subprocesses using open \|- and let the children live to the very end, sending the code they have to manage over the pipe. update: but then whether I do that or use your queued thread approach, I need also to read back from the child in order to perform complicated load-balancing. If the subprocesses are allowed to die per iteration (i.e per job parsed and submitted to a child or thread) I wouldn't have that problem One world, one people	[reply]
Re: fixed set of forked processes by Illuminatus (Curate) on Dec 02, 2010 at 17:57 UTC
I think a few more details are in order How will the jobs actually be run? You mention fork, but not exec or system If you want to communicate between parent and child, you probably don't want file-based IO. You would probably want IO::Socket instead. Use socketpair to create 2-way communication If you aren't going to exec the child into the jil job, you are probably better off using threads. Then you can simply use shared data to communicate and manage the children fnord	[reply]
Re^2: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 02, 2010 at 18:06 UTC
It would be IPC:: open3 or run3 the child will then handle the results of it's own spawning. The parent only needs to know whether the child is still alive. So the communication is a bit fake -- Parent only wants to know if closing the IPC filehandle to the child fails, signifying that the child has exited -- all this assuming there isn't another way to know if the child lives. update: or rather closing the filehandle is what perlipc suggests but what if I need to poll repeatedly for exited? update: socketpair is just a wrapper to what I already discussed. It complicates the question of how to create the multiple filehandles and still leaves the question of how to destroy them. One world, one people	[reply]
Re^3: fixed set of forked processes by Illuminatus (Curate) on Dec 02, 2010 at 18:14 UTC
have you looked at threads? You can create the threads, then use is_running to check if they are alive. If you need to pass actual data, you can use the aforementioned shared data. fnord	[reply]
Re^4: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 02, 2010 at 18:25 UTC
Re: fixed set of forked processes by salva (Canon) on Dec 02, 2010 at 17:58 UTC
What kind of data do you want to pass back from the children to the parent? Is it just some boolean indicating failure/success, a line of text or some complex data structure?	[reply]
Re^2: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 02, 2010 at 18:11 UTC
no data. not even failure or success, child capable of handling its own results. But the parent does need to know how many children are left running and update to OP: I also would prefer to avoid spawning a unix grep of unix ps to count subprocesses - that is also an unwanted overhead. One world, one people	[reply]
Re^3: fixed set of forked processes by salva (Canon) on Dec 02, 2010 at 18:55 UTC
Then use Parallel::ForkManager or Proc::Queue.	[reply]
Re^4: fixed set of forked processes by tomfahle (Priest) on Dec 03, 2010 at 05:01 UTC
Re^3: fixed set of forked processes by pileofrogs (Priest) on Dec 02, 2010 at 21:45 UTC
Your parent process will know the PID of each child process, so you can `kill 0, $pid` to see if it's running. Not sure what the performance implications are. Since you were mentioning ps & grep, I thought I'd mention this simple alternative. If all you want is to know if children are alive or dead, you might want to look into $SIG{CHLD} which tells you when kids die. That's in perlipc too. --Pileofrogs	[reply] [d/l]
Re: fixed set of forked processes by anonymized user 468275 (Curate) on Dec 08, 2010 at 18:12 UTC
My thanks to all who offered advice. Here is the solution now put into practice (most of the irrelevant code omitted). Note I had to use a homegrown alternative to FileHandle because its documentation didn't mention support of Open3. That was just an array of self-numbering filehandles for easy deletion. package jiloader; use Parallel::ForkManager; use IPC::Open3; use Fcntl qw(:flock SEEK_END); sub new { ... $opt { MAXFORKS } \|\|= 31; $opt{ PM } = new Parallel::ForkManager( $opt{ MAXFORKS } ); $opt{ PMFH } = []; # filehandle pool ... # example of one of four files that track things unlink $opt{REJFILE}; open my $rejh, ">>$opt{REJFILE}" or die "$!: $opt{REJFILE}\n"; $opt{ REJH } = $rejh; ... bless \%opt; } ... sub put { # batch changes by outer box # for submission to fork scheduler my $self = shift; if ( $self -> { REST } ) { if ( $self -> { TOPBOX } ) { $self -> { BATCH } and $self -> sched; } $self -> { BATCH } .= $self -> { CHG }; } else { $self -> { BATCH } .= $self -> { CHG }; $self -> sched; } } sub sched { my $self = shift; my $pm = $self -> { PM }; my $rh = $self -> getfh; my $wh = $self -> getfh; my $eh = $self -> getfh; # parent has allocated fh's so has to # free them when child exits # child cannot do this whatever the fh pooling solution $pm -> run_on_finish( sub { $self -> killfh( $rh, $wh, $eh ); +} ); unless( $pm -> start ) { open3 $wh, $rh, $eh, $self -> { JILCOMMAND }; print $wh $self -> { BATCH }; close $wh; unless ( $self -> jiloutparse( $rh, $eh ) ) { my $errh = $self -> { ERRH }; flock $self -> { ERRFILE }, LOCK_EX; print $errh $self -> { LASTOUT }; print $errh $self -> { LASTERR }; flock $self -> { ERRFILE }, LOCK_UN } close $rh; close $eh; $pm -> finish; } $self -> { BATCH } = ''; } ... [download] One world, one people	[reply] [d/l]


Your skill will accomplish what the force of many cannot
	PerlMonks