0xbeef has asked for the wisdom of the Perl Monks concerning the following question:
It have a program that collects output from o/s commands by forking a child process and reading its stdout/stderr seperately via a pipe. The duration and output of each exec() may not exceed a certain threshold, otherwise the exec'd child gets killed and the next system command gets run.
I have code that does this _slowly_ - maybe since I'm not buffering the reads from the pipe, but handling each line of output in terms of measuring size/duration. I cannot think of any other way of enforcing time/size thresholds on the child process...
I'm hoping an enlightened monk would have a clever trick for something like this?
Conditions:
1. Child process' stdout and stderr must be processed in seperate handles.
2. exec'd child cannot run for longer than xx secs
3. exec child's stdout and stderr must not exceed x lines/size
0xbeef
Re: Redirecting stdout/stderr to pipe
by betterworld (Curate) on Sep 19, 2005 at 11:49 UTC
|
As for the runtime restriction, you may use bash's ulimit command if you are running Unix. However, you can only restrict the cpu time with ulimit, not the idle time. | [reply] |
|
| [reply] |
Re: Redirecting stdout/stderr to pipe
by Roy Johnson (Monsignor) on Sep 19, 2005 at 16:07 UTC
|
alarm is the usual way to set a timeout. Set it when you fork, and reset it if the child exits in time.
Use select to poll for available data on each of your filehandles, adding to a variable to keep track of the amount of output they've produced.
Caution: Contents may have been coded under pressure.
| [reply] |
Re: Redirecting stdout/stderr to pipe
by Thelonius (Priest) on Sep 19, 2005 at 19:36 UTC
|
I think your main problems is that select does not interact well with buffered I/O, which you use with <READERR> and <READOUT>. You should really use sysread and do a split /\n/ at the very end.
Here's how I might do it, using IPC::Open3 and IO::Select. You should add the waitpid/close from your example:
#!perl -w
use IPC::Open3;
use IO::Select;
use Symbol qw(gensym);
use strict;
my $cmd = shift;
my $g_timeout = shift || 5;
my $g_maxlines = shift || 10;
my @output;
my @err;
$| = 1;
runcmd(\$cmd, \@output, \@err);
print "output = \n";
print " $_\n" for @output;
print "error = \n";
print " $_\n" for @err;
sub runcmd {
my ($cmdref, $outref, $errref) = @_;
my ($childin, $childout, $childerr);
$childerr = gensym;
my $pid = open3($childin, $childout, $childerr, $cmd)
or die "Cannot run cmd '$cmd': $!\n";
my $select = IO::Select->new or die "Cannot create select object: $!
+\n";
my @hold_output;
for (($childout, $childerr)) {
$select->add($_);
$hold_output[fileno($_)] = [0, 0, ""]; # eof, lines, buffer;
}
$@ = undef;
eval {
local $SIG{ALRM} = sub { die "alarm\n" };
my $deadline = $g_timeout + time;
my $g_stop = 0;
alarm($g_timeout + 1);
while (!$g_stop && $select->count > 0) {
$! = 0;
my @ready = $select->can_read($deadline - time);
if (!@ready) {
$g_stop = 1;
}
for my $handle (@ready) {
my $fno = fileno($handle);
my $line;
my $bytesread = sysread $handle, $line, 1024;
if ($bytesread) {
$hold_output[$fno][2] .= $line;
$hold_output[$fno][1] += $line =~ y/\n/\n/;
if ($hold_output[$fno][1] >= $g_maxlines) {
$select->remove($handle);
}
} elsif ($!) {
die "$!";
} else {
$hold_output[$fno][0] = 1; #EOF
$select->remove($handle);
}
}
}
alarm(0);
};
if ($@) {
# print STDERR "\$\@ = $@\n";
die unless $@ eq "alarm\n";
}
# Note: lines may exceed $g_maxlines because of buffering
# of output in the child process
@$outref = split /\n/, $hold_output[fileno($childout)][2];
@$errref = split /\n/, $hold_output[fileno($childerr)][2];
}
| [reply] [d/l] |
|
Okay, thanks for that! I would like however like to be enlightened more: ;)
1. I'd rather not use a solution using external modules, since the collector will have to run on MANY hosts as painlessly and non-intrusively as possible. Is there a high performance approach without external modules?
2. I am unsure as to how much data should be read from the pipe at a time in order to optimise throughput as much as possible...
-0xbeef
| [reply] |
|
1. I'd rather not use a solution using external modules, since the collector will have to run on MANY hosts as painlessly and non-intrusively as possible. Is there a high performance approach without external modules?
The modules cited in the code that Thelonius posted (IPC::Open3, IO::Select, Symbol) are not "external", in the sense that they are all included in the standard Perl "core" distribution. That is, wherever a reasonably current version of Perl is installed, these modules are also installed by default.
(If these many hosts you speak of have non-standard or hopelessly outdated perl installations, that's going to be a problem anyway.)
As for buffer size, the 1024 bytes suggested by Thelonius is fine for handling text that must ultimately be treated in a line-oriented fashion; if you're dealing with really high data rates, an 8k buffer should be about optimal.
| [reply] |
Re: Redirecting stdout/stderr to pipe
by liverpole (Monsignor) on Sep 19, 2005 at 13:07 UTC
|
Can you provide some of the code you've tried? (If it's very large, try carving it down to the salient functionality). The purpose of supplying this information is threefold -- first, it shows what you've tried or haven't tried; second, you may have a bug, or missed some subtle point which someone else can identify, and third, it gives those who might respond a starting point for thinking about a solution.
| [reply] |
|
pipe READOUT,WRITEOUT;
pipe READERR,WRITEERR;
$g_pid = fork();
if ( $g_pid < 0 ) { # Error
die "Error: fork failed with error $g_pid: $!.";
$rc = 1;
} elsif ( $g_pid == 0 ) { # Child
close(READOUT);
close(READERR);
close(STDOUT);
open(STDOUT,">&WRITEOUT");
close(STDERR);
open(STDERR,">&WRITEERR");
close(STDIN);
open(STDIN,"</dev/null");
exec($$cmdref);
exit(1);
} else { # Parent
$g_did_run++;
$g_pids{$g_pid} = 1;
close(WRITEOUT);
close(WRITEERR);
if ( $g_timeout > 0 ) {
$g_stop = 0;
alarm $g_timeout;
}
my $r_in = "";
my $w_in = "";
vec($r_in,fileno(READOUT),1) = 1;
vec($r_in,fileno(READERR),1) = 1;
vec($w_in,fileno(STDOUT),1) = 1;
my $e_in = $r_in | $w_in;
my ($r_out,$w_out,$e_out);
my $cnt=0;
my $eof_e = 0;
my $eof_o = 0;
while (1) {
my ($nfound,$timeleft) = select($r_out=$r_in,undef,$e_out=$e_
+in,1);
prog_log(20,sprintf("run_cmd: select returned time [%d] found
+ [$nfound] timeleft [$timeleft] r_out [%s] r_err [%s] e_out [%s] e_er
+r [%s]\n", time,
vec($r_out,fileno(READOUT),1)?'y':'n',
vec($r_out,fileno(READERR),1)?'y':'n',
vec($e_out,fileno(READOUT),1)?'y':'n',
vec($e_out,fileno(READERR),1)?'y':'n'));
if ( vec($r_out,fileno(READERR),1) and !vec($e_out,fileno(REA
+DERR),1) ) {
if ( eof(READERR) ) {
$eof_e = 1;
} else {
my $line = <READERR>;
chomp $line;
push @$errref, $line;
}
}
if ( vec($r_out,fileno(READOUT),1) and ! vec($e_out,fileno(RE
+ADOUT),1) ) {
if ( eof(READOUT) ) {
$eof_o = 1;
} else {
my $line = <READOUT>;
chomp $line;
push @$outref, $line;
}
}
last if ( $eof_e and $eof_o );
last if ( $g_stop );
}
if ( $g_stop ) { # Indicator for select loop to break
$rc = 2;
} else {
my $loglvl = 5;
my $t_pid = waitpid($g_pid,0);
if ( $t_pid < 0 ) {
$rc = 3;
} elsif ( $t_pid == 0 ) {
# Process no terminated yet!
$loglvl = 1;
$rc = 4;
} elsif ( $t_pid == $g_pid ) {
$rc = $?;
} else {
# Not possible!
$loglvl = 1;
$rc = 5;
}
close(READOUT);
close(READERR);
delete $g_pids{$g_pid};
prog_log($loglvl,"run_cmd: waitpid($g_pid) returned $t_pid -
+rc $rc.");
}
Thanks in advance! -0xbeef | [reply] [d/l] |
|
Minor nit-pick that probably doesn't answer your immediate problem: "fork()" does not return a negative number; failure is indicated by returning undef. So your initial "if" condition should be:
if ( ! defined( $g_pid ))
Apart from that, you seem to be using variables with global scope for flow-control, and you didn't show how/whether these were initialized ($g_stop, $g_timeout). So it's hard to guess whether it's a problem that $g_stop is never assigned a value within the "while(1)" loop. Also, you didn't include whatever code you have (if any) that actually handles the event generated by "alarm()". Could that be relevant? | [reply] [d/l] |
|
|