Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Parallel::ForkManager takes too much time to start 'finish' function

by ohadk (Initiate)
on Oct 29, 2018 at 13:33 UTC ( [id://1224835]=perlquestion: print w/replies, xml ) Need Help??

ohadk has asked for the wisdom of the Perl Monks concerning the following question:

I use the regular Parallel::ForkManager module in Perl. I execute around 10 children processes. 'passes_thresholds' function take few milliseconds or nanoseconds (checked it). In case i run all processes one by one (without Parallel::ForkManager) the whole process took 80-250 milliseconds. In case i run them as parallel, the whole process takes at lease 1 second. I found that the Fork spend 1 second to start the 'finish' function. I put a timer when the child process finished his job and should go to 'finish' function. One second is too much for my development.
sub parallel_execute { my $this = shift; foreach my $a (@a_array) { my $pid = $this->{fork_manager}->start and next; my $res = $a->passes_thresholds(); $a->{timer} = Benchmark::Timer->new(); $svc->{timer}->start; $this->{fork_manager}->finish(0,{a => $a, plugin_result => $re +s}); } } $this->{fork_manager}->run_on_finish( sub { my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_stru +cture_reference) = @_; my $a = $data_structure_reference->{a}; if (exists $a->{timer}) { $a->{timer}->stop; debug "took: " . $a->{timer}->report; } });
Do you have any idea why it took at least 1 second to start the 'finish' command? (I am using Unix server, and perl 5.10)
  • Comment on Parallel::ForkManager takes too much time to start 'finish' function
  • Download Code

Replies are listed 'Best First'.
Re: Parallel::ForkManager takes too much time to start 'finish' function
by Corion (Patriarch) on Oct 29, 2018 at 13:35 UTC

    Cross posted at Stackoverflow.

    While we don't mind the occasional cross-post, please tell people so that there is no double work.

    The usual approach is to give each forked child more work to do, by sending each child a batch of items to work on instead of launching a child for every work item. Launching and tearing down a child takes time, so ideally you don't do that too often.

      I agree. That's the same reason why webservers use pre-forking. Which, depending on the task at hand, could also be a possible solution for this problem.

      Fork the children at the start, send them work and have the parent just manage the amount of currently running children to always keep some spares, but reap any unneeded extras. Done right, this can even reduce the total amount of processes running at any one time. Again, depending on how long the task takes, it might be more efficient to wait for a child to finish it's current task and directly work on the next in the queue instead of spinning up a new child.

      And it's always a good idea to have some code in place to manage and limit the number of concurrent tasks. The moment your system runs out of RAM and starts swapping stuff to disk, all hope is lost for speedy performance. Same goes with any other resources. And it could lead to other rather unfortunate "features".

      I once had to do a lengthy database repair, because someone-who-shall-not-be-named-but-looks-like-me had a major memory leak and the linux kernel started killing random tasks including some rather essential postgresql processes.

      perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Parallel::ForkManager takes too much time to start 'finish' function
by ikegami (Patriarch) on Oct 31, 2018 at 01:39 UTC

    The on_finish callback is only called when P::FM reaps a child, and P::FM only reaps a child under three conditions:

    • When $pm->start is called and the number of children that have been started but not reaped is equal to the maximum.
    • When $pm->reap_finished_children is called.
    • When $pm->wait_all_children is called.

    There could be an arbitrarily long delay between a child exiting and one the above events. Adding the following to your program should eliminate that delay:

    $SIG{CHLD} = sub { $pm->reap_finished_children };

    By the way, if the work performed by your child only takes "a few milliseconds or nanoseconds", you are actually slowing things down by using P::FM. Data passed to `finish` gets serialized and written to disk, then read from the disk and deserialized for the `on_finish` callback!

Re: Parallel::ForkManager takes too much time to start 'finish' function
by ohadk (Initiate) on Oct 31, 2018 at 10:51 UTC
    Thanks you all, i found the problem. ForManager module has 'waitpid_blocking_sleep' parameter, which defined as 1 second by default. There is a function called 'set_waitpid_blocking_sleep' which we can defined this parameter (sleep time). We can set zero or fraction of seconds. I set this paramter to zero and it fixed my issue.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1224835]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-20 13:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found