Thanks a tremendous amount everyone. We figured it out, rather we re-factored some 6 year old code.
This was the process tree showing PID and PPID at the beginning of every line:
1234 1 do-upgrades.pl. <--- daemon
2345 1234 do-upgrades.pl <--- forked process with system() call
2346 2345 upgrade-sensor.pl. <--- worker bee
Every time the daemon while(1) looped forever, it stopped over at waitpid:
# read everything from the pipe.
my $pipename = "sensor-fifo";
while (1) {
chomp(my $sensor = `cat $pipename`);
# Something in pipe; see if it is real or just empty string
# [Omitted more sophisticated tests here]
if ($sensor) {
print "upgrade requested for $sensor, forking...\n";
my $f=fork;
if(defined ($f) and $f==0) { # I'm a child
# Here you can the shell meta-characters
my $result = system("${perlhome}/upgrade-sensor.pl $sensor >
+> ${workpath}/worklog 2>&1");
# System is a blocking call, meaning that the
# intermediate process does not exit.
print "result for $sensor was $result.\n";
exit(0);
} elsif (defined ($f)==0) { # Fork failed
print "Fork failed for $sensor!\n";
}
# else I'm the parent
}
print "waiting on child $$ to complete $sensor.\n";
1 while waitpid(-1, WNOHANG)>0; # reaps children
print "Finished waiting.\n";
}
The meaning of the intermediate process was not clear. Upon completion of the system() call, it prints and exits. Its sole purpose seems to allow catching the $result into the daemon's log, but we don't care. So we switched over to exec():
# read everything from the pipe.
my $pipename = "sensor-fifo";
while (1) {
chomp(my $sensor = `cat $pipename`);
my @args = split / /,$sensor;
# Something in pipe; see if it is real or just empty string
# [Omitted more sophisticated tests here]
if ($sensor) {
print "upgrade requested for $sensor, forking...\n";
my $f=fork;
if(defined ($f) and $f==0) { # I'm a child
exec ("${perlhome}/upgrade-sensor.pl", @args)
} elsif (defined ($f)==0) { # Fork failed
print "Fork failed for $sensor!\n";
}
# else I'm the parent
}
print "waiting on child $$ to complete $sensor.\n";
1 while waitpid(-1, WNOHANG)>0; # reaps children
print "Finished waiting.\n";
}
Now the process tree looks like this:
1234 1 do-upgrades.pl. <--- daemon
2346 1234 upgrade-sensor.pl. <--- worker bee
Much simpler.
From running the daemon on the command line, we now saw that the output of the fork/exec'd upgrade-sensor.pl process was writing into the daemon's log. More specifically, the daemon's log created thus, when the daemon was launched:
/usr/bin/nohup ${RFS_HOME}/do-upgrades.pl > ${WORK_PATH}/do.log &
(Still need to write an init.d/ controller).
What we did to make it work (we think) was to kill off dup handles, and re-open. We added the two close commands at the head of the upgrade-sensor.pl script, and now logs are properly separated. Love inheriting code!
close (STDERR);
close (STDOUT);
my $logabspath = $logpath . "/" . $logfile;
open (STDOUT, "| tee -ai $logabspath");
open (STDERR, "| tee -ai $logabspath");
print "\n\n----------------------------------------------\n";
What we don't know is whether this is correct, or not. It works on two hosts the same way:
$ uname -a
Linux toolchain 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28
+UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ uname -a
Linux oldtoolchain 2.6.32-754.17.1.el6.x86_64 #1 SMP Tue Jul 2 12:42:4
+8 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
so that's good. But these messages are not coming from our code, rather the first issues upon calling fork. The second issues from waitpid.
Child forked (pid=14825), 1 processes running now at ./do-upgrades.pl
+line 49.
Unknow process 0 has exited, ignoring it at ./do-upgrades.pl line 84
Yes, it is ->Unknow<-.
Thanks again - Clarkman |