Hi marioroy,
As you said, it's apples to oranges, so let's try to be more appleish (or is it orangeish?).
First attempt was to group job ids so a child does more than one during its lifetime. Turns out to be fairly simple.
Right after I posted Re: Parallel::ForkManager and multiple datasets, I realized I had written roughly the same forking code several times, so it was time
to move it to a module.
Here's the module. It uses callbacks for the child code and for the parent code that processes the child's returned value.
package Forking::Amazing;
sub run ($&&@)
{
my ( $maxforks, $childcallback, $resultcallback, @ids ) = @_;
use Storable qw( freeze thaw );
use IO::Select;
my %fh2id;
my $sel = IO::Select->new;
while( @ids or $sel->count ) # unstarted or active
{
while( @ids and $sel->count < $maxforks ) # start all forks allowe
+d
{
my $id = shift @ids;
if( open my $fh, '-|' ) # forking open
{
$sel->add( $fh ); # parent
$fh2id{$fh} = $id;
}
else # child code goes here
{
print freeze $childcallback->($id);
exit;
}
}
for my $fh ( $sel->can_read ) # collecting child data
{
$sel->remove( $fh );
$resultcallback->($fh2id{$fh}, thaw do { local $/; <$fh> });
}
}
}
1;
__END__
=head1 EXAMPLE program
use Forking::Amazing; # small example program
use Data::Dump 'dd';
Forking::Amazing::run(
5, # max forks
sub { +{id => pop, pid => $$} }, # runs in child
sub {dd pop}, # process result of child in pare
+nt
'a'..'z'); # ids (one fork for each id)
=cut
The module name may change in the future. :)
Here's code using that module that does grouping of job ids.
The id passed to the child is now an anon array of job ids, and a child now returns
an anon array of results.
#!/usr/bin/perl
use strict;
use warnings;;
use Forking::Amazing;
use Data::Dump 'dd';
use Time::HiRes qw(time);
my $groupsize = 1000;
my @rawids = 'job0001' .. 'job9999';
my @ids;
push @ids, [ splice @rawids, 0, $groupsize ] while @rawids;
my @answers;
my $start = time;
Forking::Amazing::run
20,
sub { [ map +{id => $_, pid => $$, time => time - $start}, @{+shift}
+ ] },
sub { push @answers, @{+pop} },
@ids;
my $end = time - $start;
dd \@answers;
print "forking time $end\n";
It's a significant speed up :)
Note that I moved the dd out of the timing loop, since the dd takes over 1.5 seconds to run on my machine and swamps the forking time.
|