http://qs321.pair.com?node_id=463878

chanakya has asked for the wisdom of the Perl Monks concerning the following question:

Greetings eveyone !!
I have a script which will look in a pre-defined log directory, gets the file names,
processes the files one by one and removes the processed file from the log directory.
Now, I want to split the processing of log files to multiple sub-processes, as the current processing takes longer time.

What I want to do is say, when X.pl executes, the X.pl will look into the log directory and gets
the file names in the log directory. For Example if X.pl finds 3 log files, then it
will fork 3 sub-processes of Y.pl and pass each of the log file as an input to the sub processes of Y.pl

The X.pl will have to wait until all the sub-processes returns. Once all the sub-processes return then the parent, i.e X.pl will get newer filenames and the process continues. And also the most important part is all the three sub-processes should run at the same time and not one after another.

Can anyone tell me how to implement the forking and parent script to wait until the sub-processes return.
  • Comment on Forking and running all child processes at the same time

Replies are listed 'Best First'.
Re: Forking and running all child processes at the same time
by robartes (Priest) on Jun 06, 2005 at 12:30 UTC

    As an aside -- why do you think processing the log files in parallel will increase throughput? If the log parsing is I/O bound, you'll only generate more I/O contention and possibly even increase overall execution times. If the process is CPU bound, you will need to run this on a multi processor machine (of whatever flavour is available) to benefit. If it's a single processor machine, you'll only benefit if the parsing is just the right combination of I/O vs CPU bound, and then only if you keep the number of concurrent processes quite low.

    So, in short, if you're not running on a multiprocessor machine, make sure that this is not a case of premature optimisation.

    CU
    Robartes-

Re: Forking and running all child processes at the same time
by Fletch (Bishop) on Jun 06, 2005 at 12:18 UTC
Re: Forking and running all child processes at the same time
by salva (Canon) on Jun 06, 2005 at 12:28 UTC
    use Proc::Queue size => 10, qw(run_back all_exit_ok); while(...) { my @logfn = get_log_names(); my @pids = map { run_back { process_log $_ } } @logfn; all_exit_ok(@pids) or print STDERR "some child failed\n"; }
Re: Forking and running all child processes at the same time
by reneeb (Chaplain) on Jun 06, 2005 at 12:25 UTC
    my @files = qw(file1 file2 file3); # your log-files for (1..scalar(@files)){ next unless(-e $files[$_-1]); my $pid=fork(); if($pid==-1){ warn($!); last; } if($pid){ $pids{$pid}=1; } else{ # do what you want with $files[$_-1] exit(0); } } while(keys %pids){ my $pid=waitpid( -1, WNOHANG ); die "$!" if $pid == -1; delete $pids{$pid}; }

      Why the test for $pid==-1 ? From perldoc -f fork:

      returns the child pid to the parent process, 0 to the child process, or "undef" if the fork is unsuccessful.

      the lowliest monk

Re: Forking and running all child processes at the same time
by tlm (Prior) on Jun 06, 2005 at 12:33 UTC
    for my $file ( <*.log> ) { # for example... defined( my $pid = fork ) or die "Unable to fork: $!"; unless ( $pid ) { # in child # process $file exit; } } 1 while wait > 0; # avoid zombies

    the lowliest monk

      Hi
      As pointed out by BrowserUK in this node Re: Unix Review column, your:
      1 while wait > 0; # avoid zombies
      won't work on windows systems. The windows implementation uses the negative of the thread ID as the pid, so wait returns negative numbers for all children.

      It seems the most cross-platform wait statement is:
      1 while wait != -1 ; # avoid zombies
      since the main thread is ID 1, and the child threads won't have that ID. And it's what wait natively returns when there are no further child processes.

      - j