Thank you, ambrus. This is precisely the example I needed to help me get started.
It wasn't clear enough from my original post that my problem isn't just that I don't understand how to do Windows process control using Perl. My problem is that I don't understand process control well at all. And when I read about it in documentation—not just Perl documentation, any documentation—my head explodes. I struggle with the unfamiliar lingo. If there's a good tutorial for absolute beginners, I haven't found it yet. But with the help of your straightforward Perl code snippet, I was able to make a good start.
So here's the script I cobbled together based on your example. It has extra junk in it that's only there for self-educational purposes. Also, there are actually thousand of lines of DATA (i.e., external commands to be run), not just these few.
use strict;
use warnings;
use English qw( -no_match_vars ); # For $CHILD_ERROR
use POSIX ();
my $BATCH_SIZE = 8;
my @commands;
LINE:
while (<DATA>) {
next LINE if m/^\s*#/;
chomp;
my ($txt_file, $tab_file, $total_documents) = split m/,/, $_, 3;
my $command = "doit $txt_file > $tab_file";
push @commands, [ $command, $txt_file, $total_documents ];
}
while (@commands) {
my @pids;
my %txt_file_by;
for my $cmd (splice @commands, 0, $BATCH_SIZE) {
my ($command, $txt_file, $total_documents) = @$cmd;
my $pid = system(1, $command);
push @pids, $pid;
my $timestamp = POSIX::strftime('%H:%M:%S', localtime);
print "$timestamp\t$pid\t$command\n";
$txt_file_by{$pid} = $txt_file;
}
for my $pid (@pids) {
$pid == waitpid($pid, 0) or die;
die if $CHILD_ERROR;
my $timestamp = POSIX::strftime('%H:%M:%S', localtime);
print "$timestamp\t$pid\t$txt_file_by{$pid}\n";
}
}
exit 0;
__DATA__
D000349000.txt,D000349000.tab,564530
Z0000042.txt,Z0000042.tab,457277
Z0000013336.txt,Z0000013336.tab,457277
Z0000013426.txt,Z0000013426.tab,382292
D000250000.txt,D000250000.tab,382014
C000004770.txt,C000004770.tab,356580
Z000003462.txt,Z000003462.tab,356580
Z000004770.txt,Z000004770.tab,356580
Z0000012073.txt,Z0000012073.tab,349325
D000303000.txt,D000303000.tab,347852
Z0000013787.txt,Z0000013787.tab,347852
Z0000014288.txt,Z0000014288.tab,289025
D004607000.txt,D004607000.tab,268763
D000245000.txt,D000245000.tab,258363
Z0000012214.txt,Z0000012214.tab,257861
Z0000013342.txt,Z0000013342.tab,257861
Z0000015322.txt,Z0000015322.tab,243612
D000275000.txt,D000275000.tab,242962
D000272000.txt,D000272000.tab,224791
D000271000.txt,D000271000.tab,223537
D000717000.txt,D000717000.tab,216624
Z0000015315.txt,Z0000015315.tab,215390
D004457000.txt,D004457000.tab,211271
Z0000012004.txt,Z0000012004.tab,211271
Until I implemented this, ran it, and watched it closely in action, I couldn't figure out either system() or waitpid(). I don't grok them, but I more-or-less understand what they're accomplishing. It's still unclear to me what the first argument of system(), 1, is for, and I also don't understand what the second argument of waitpid(), 0, is intended to do. An explanation of these mysterious arguments would be helpful.
What are examples of appropriate messages to use with the two calls to die()? I don't fully understand what's being tested and could fail at those points in the script. More generally, how might I flesh out the error handling in the script to make it more robust?
What's the difference between a process and a thread? When and why would I choose to use multiple processes rather than multiple threads and vice versa? I'm running Microsoft Windows, not Unix or Linux. How much does this matter?
If there's an easier or slicker way to compute a timestamp than how I did it here using POSIX::strftime() and localtime(), I'd appreciate a tip.
Thank you again for your help.
Jim
|