Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: Parallel::ForkManager and wait_all_children

by rgren925 (Beadle)
on May 14, 2015 at 22:16 UTC ( [id://1126701]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Parallel::ForkManager and wait_all_children
in thread Parallel::ForkManager and wait_all_children

I am now using your code posted above (using open3 instead of exec/system, etc.):
use strict; use warnings; use Parallel::ForkManager; use IPC::Open3 qw( open3 ); use POSIX qw( WNOHANG ); use constant TIMEOUT => 120; my @runArray = ("test1.sh", "test2.sh", "test3.sh", "test4.sh", "test5 +.sh"); my ($pid, $exitCode, $ident); my $currentTime; my $forkMgr = Parallel::ForkManager->new(3); $forkMgr->run_on_start( sub { ($pid, $ident) = @_; print "$currentTime Started ==> $ident\n"; } ); $forkMgr->run_on_finish( sub { ($pid, $exitCode, $ident) = @_; print "$currentTime Ended ==> $ident\n"; } ); while (1) { $currentTime = localtime(); for my $runCommand (@runArray) { $forkMgr->start($runCommand) and next; my $pid = open3('<&STDIN', '>&STDOUT', '>&STDERR', "/usr/localcw/opt/patrol/nagios/libexec/$runCo +mmand"); wait_for_test_to_end($pid); $forkMgr->finish($? & 0x7F ? 0x80 | ($? & 0x7F) : $? >> 8); } $forkMgr->wait_all_children; sleep 10; } exit; sub wait_for_test_to_end { my ($pid) = @_; my $abs_timeout = time() + TIMEOUT; while (1) { return if waitpid($pid, WNOHANG) > 0; last if time() > $abs_timeout; sleep(1); } kill(ALRM => $pid); $abs_timeout = time() + 15; while (1) { return if waitpid($pid, WNOHANG) > 0; last if time() > $abs_timeout; sleep(1); } kill(KILL => $pid); waitpid($pid, 0); }
Still same behavior.
The looping test4.sh is still hanging everything up.
Here's some trace output. The "Started/Ended" statements are coming from the callbacks and the "I am running..." are coming from the test1-5.sh scripts.
Thu May 14 16:52:40 2015 Started ==> test1.sh Thu May 14 16:52:40 2015 Started ==> test2.sh Thu May 14 16:52:41 CDT 2015 I am running test1.sh Thu May 14 16:52:41 CDT 2015 I am running test2.sh Thu May 14 16:52:41 CDT 2015 I am running test3.sh Thu May 14 16:52:40 2015 Started ==> test3.sh Thu May 14 16:52:40 2015 Ended ==> test3.sh Thu May 14 16:52:40 2015 Ended ==> test2.sh Thu May 14 16:52:40 2015 Ended ==> test1.sh Thu May 14 16:52:40 2015 Started ==> test4.sh Thu May 14 16:52:43 CDT 2015 I am running test4.sh Thu May 14 16:52:43 CDT 2015 I am running test5.sh Thu May 14 16:52:53 CDT 2015 I am running test4.sh Thu May 14 16:53:03 CDT 2015 I am running test4.sh Thu May 14 16:53:13 CDT 2015 I am running test4.sh Thu May 14 16:53:23 CDT 2015 I am running test4.sh Thu May 14 16:53:33 CDT 2015 I am running test4.sh Thu May 14 16:53:43 CDT 2015 I am running test4.sh Thu May 14 16:53:53 CDT 2015 I am running test4.sh Thu May 14 16:54:03 CDT 2015 I am running test4.sh Thu May 14 16:54:13 CDT 2015 I am running test4.sh Thu May 14 16:54:23 CDT 2015 I am running test4.sh Thu May 14 16:54:33 CDT 2015 I am running test4.sh Thu May 14 16:54:43 CDT 2015 I am running test4.sh Thu May 14 16:52:40 2015 Started ==> test5.sh Thu May 14 16:52:40 2015 Ended ==> test5.sh Thu May 14 16:52:40 2015 Ended ==> test4.sh Thu May 14 16:54:56 2015 Started ==> test1.sh Thu May 14 16:54:56 2015 Started ==> test2.sh Thu May 14 16:54:56 CDT 2015 I am running test1.sh Thu May 14 16:54:56 CDT 2015 I am running test2.sh Thu May 14 16:54:56 CDT 2015 I am running test3.sh Thu May 14 16:54:56 2015 Started ==> test3.sh Thu May 14 16:54:56 2015 Ended ==> test2.sh Thu May 14 16:54:56 2015 Ended ==> test1.sh Thu May 14 16:54:56 2015 Ended ==> test3.sh Thu May 14 16:54:56 2015 Started ==> test4.sh Thu May 14 16:54:58 CDT 2015 I am running test4.sh Thu May 14 16:54:58 CDT 2015 I am running test5.sh Thu May 14 16:55:08 CDT 2015 I am running test4.sh Thu May 14 16:55:18 CDT 2015 I am running test4.sh etc.
During the 2-minute timeout wait to kill test4.sh, nothing else is happening (not even the run_on_start/finish for test5.sh). I still have 2 forkable processes (of the defined 3) that are not being used, I believe, because forkmanager is waiting for all the children to be done. I recognize that one process will be tied up for the timeout value, but I need the other two to continue processing available work (test1-3.sh and test5.sh). I'll take care to ensure test4 doesn't run again while there is one already running (using a hash of running jobs managed by the callbacks.

That is the crux of my problem.

Replies are listed 'Best First'.
Re^4: Parallel::ForkManager and wait_all_children
by ikegami (Patriarch) on May 15, 2015 at 14:43 UTC

    During the 2-minute timeout wait to kill test4.sh, nothing else is happening (not even the run_on_start/finish for test5.sh). I still have 2 forkable processes (of the defined 3) that are not being used

    That's completely false!!!

    on_start for test5 runs long before test4 finishes. Check the timestamps; your output is out of order. (You probably redirected the output, which caused STDOUT to become block buffered. Use $| = 1; to disable buffering on STDOUT.)

    Thu May 14 16:52:40 2015 Started ==> test4.sh Thu May 14 16:52:40 2015 Started ==> test5.sh Thu May 14 16:52:43 CDT 2015 I am running test4.sh Thu May 14 16:52:43 CDT 2015 I am running test5.sh [If you were to print out when test5 ends, it would be here] Thu May 14 16:52:53 CDT 2015 I am running test4.sh Thu May 14 16:53:03 CDT 2015 I am running test4.sh Thu May 14 16:53:13 CDT 2015 I am running test4.sh Thu May 14 16:53:23 CDT 2015 I am running test4.sh Thu May 14 16:53:33 CDT 2015 I am running test4.sh Thu May 14 16:53:43 CDT 2015 I am running test4.sh Thu May 14 16:53:53 CDT 2015 I am running test4.sh Thu May 14 16:54:03 CDT 2015 I am running test4.sh Thu May 14 16:54:13 CDT 2015 I am running test4.sh Thu May 14 16:54:23 CDT 2015 I am running test4.sh Thu May 14 16:54:33 CDT 2015 I am running test4.sh Thu May 14 16:54:43 CDT 2015 I am running test4.sh

    When test5 is running, there's only one unused process, and that's because there's nothing for that process to do. Only when test5 finishes do you have two unused processes, and that's because there's nothing for those two processes to do (since you want all processes to end before pausing before running the test suite again).

    You are right about on_finish, though. on_finish code is not guaranteed to run immediately when the child ends. It will run when the parent reaps the child, which will happen at some point before wait_all_children returns.

      Thanks for the $|. It definitely helps put things in perspective.

      I maintain that the other processes are, indeed, waiting behind the looping process to be killed. Here's why.

      All I did was change the TIMEOUT constant from 120 to 30.

      Here's the output at 120 seconds:

      Fri May 15 11:29:37 2015 Started ==> test1.sh Fri May 15 11:29:37 2015 Started ==> test2.sh Fri May 15 11:29:37 2015 Started ==> test3.sh Fri May 15 11:29:37 CDT 2015 I am running test1.sh Fri May 15 11:29:37 CDT 2015 I am running test2.sh Fri May 15 11:29:38 CDT 2015 I am running test3.sh Fri May 15 11:29:37 2015 Ended ==> test3.sh Fri May 15 11:29:37 2015 Started ==> test4.sh Fri May 15 11:29:39 CDT 2015 I am running test4.sh Fri May 15 11:29:37 2015 Ended ==> test2.sh Fri May 15 11:29:37 2015 Ended ==> test1.sh Fri May 15 11:29:37 2015 Started ==> test5.sh Fri May 15 11:29:40 CDT 2015 I am running test5.sh Fri May 15 11:29:37 2015 Ended ==> test5.sh Fri May 15 11:29:49 CDT 2015 I am running test4.sh Fri May 15 11:29:59 CDT 2015 I am running test4.sh Fri May 15 11:30:09 CDT 2015 I am running test4.sh Fri May 15 11:30:19 CDT 2015 I am running test4.sh Fri May 15 11:30:29 CDT 2015 I am running test4.sh Fri May 15 11:30:39 CDT 2015 I am running test4.sh Fri May 15 11:30:49 CDT 2015 I am running test4.sh Fri May 15 11:30:59 CDT 2015 I am running test4.sh Fri May 15 11:31:09 CDT 2015 I am running test4.sh Fri May 15 11:31:19 CDT 2015 I am running test4.sh Fri May 15 11:31:29 CDT 2015 I am running test4.sh Fri May 15 11:31:39 CDT 2015 I am running test4.sh Fri May 15 11:29:37 2015 Ended ==> test4.sh Fri May 15 11:31:51 2015 Started ==> test1.sh Fri May 15 11:31:51 2015 Started ==> test2.sh Fri May 15 11:31:51 2015 Started ==> test3.sh Fri May 15 11:31:52 CDT 2015 I am running test1.sh Fri May 15 11:31:52 CDT 2015 I am running test2.sh Fri May 15 11:31:52 CDT 2015 I am running test3.sh Fri May 15 11:31:51 2015 Ended ==> test3.sh Fri May 15 11:31:51 2015 Ended ==> test1.sh Fri May 15 11:31:51 2015 Ended ==> test2.sh
      Here's the output at 30 seconds:
      Fri May 15 12:58:08 2015 Started ==> test1.sh Fri May 15 12:58:08 2015 Started ==> test2.sh Fri May 15 12:58:08 2015 Started ==> test3.sh Fri May 15 12:58:08 CDT 2015 I am running test1.sh Fri May 15 12:58:09 CDT 2015 I am running test2.sh Fri May 15 12:58:09 CDT 2015 I am running test3.sh Fri May 15 12:58:08 2015 Ended ==> test2.sh Fri May 15 12:58:08 2015 Ended ==> test1.sh Fri May 15 12:58:08 2015 Ended ==> test3.sh Fri May 15 12:58:08 2015 Started ==> test4.sh Fri May 15 12:58:08 2015 Started ==> test5.sh Fri May 15 12:58:11 CDT 2015 I am running test4.sh Fri May 15 12:58:11 CDT 2015 I am running test5.sh Fri May 15 12:58:08 2015 Ended ==> test5.sh Fri May 15 12:58:21 CDT 2015 I am running test4.sh Fri May 15 12:58:31 CDT 2015 I am running test4.sh Fri May 15 12:58:41 CDT 2015 I am running test4.sh Fri May 15 12:58:08 2015 Ended ==> test4.sh Fri May 15 12:58:53 2015 Started ==> test1.sh Fri May 15 12:58:53 2015 Started ==> test2.sh Fri May 15 12:58:53 2015 Started ==> test3.sh Fri May 15 12:58:53 CDT 2015 I am running test1.sh Fri May 15 12:58:53 CDT 2015 I am running test2.sh Fri May 15 12:58:53 CDT 2015 I am running test3.sh Fri May 15 12:58:53 2015 Ended ==> test2.sh Fri May 15 12:58:53 2015 Ended ==> test1.sh Fri May 15 12:58:53 2015 Ended ==> test3.sh
      If it were true that the other two forkable processes were processing work as soon as it's available, why does it take over 2 minutes for test1-3,5 to run when the test4 timeout is 120 but only a bit over 30 secs when it's set to 30?
      This tells me that everything is waiting for the last process to finish in order to satisfy the wait_all_children. I would expect test4 to take TIMEOUT seconds to complete, but I would also expect test1-3,5 to run as soon as a forkable process is available (i.e., after the 10 second sleep in the while loop).

      What am I missing here?

        I maintain that the other processes are, indeed, waiting behind the looping process to be killed.

        Yes, just like you asked. Wait for all five processes to end/hang before sleeping ten seconds and re-running all five programs again.

        In both of the cases, there's less than 12s between the time the first batch of five finishes and the next one starts, so it's working fine.


        Why does it say test4.sh ended at 11:29:37 when it was still producing output at 11:31:39?

        Also, again, it would really help if you printed when the processes exited if you're going to keep discussing this.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126701]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-24 00:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found