swkronenfeld has asked for the wisdom of the Perl Monks concerning the following question:
Updated this description: I'm looking at a program in which a connection to MySQL is established via the DBI module. Then a wget command is issued that takes a long time. During that time, the database connection times out, and must send a signal to the parent. This causes $? and $! to be set. But in addition to those, the system('wget') call returns -1. I know the wget succeeds, because the file is completely downloaded. But since system() returns -1, it looks like the wget failed!
Update2: I think Tye figured this out while we were chatting in the CB: system calls waitpid, and waitpid is interrupted by the signal. So my child wget command keeps on executing the background, while the perl script resumes execution.
I wrote a sample program that shows the same problem, but I think that it might be causing confusion. I don't want the child to receive the signal, nor am I using this alarm as a timeout to make sure the system call doesn't hang. My use of alarm w/ system here may be wrong, as mr_mischief pointed out to me. I'm leaving it here though, because I *think* it demonstrates the same problem.
#!/usr/bin/env perl
use strict;
use warnings;
$SIG{ALRM} = sub {
print( "Alarm triggered, making system call...\n" );
unlink('/doesnt/exist'); # this will definitely fail
};
alarm( 2 );
my $retval = system('sleep 4');
if( $retval == -1 ) {
print( "system() retval = $retval; " . '$?' . " = $?; " . '$!' . "
+ = $!\n" );
}
elsif( $retval == 0 ) {
print( "system() returned 0\n" );
}
else {
print( "system() return $retval\n" );
}
And so here is a test run:
test$ ./test.pl
Alarm triggered, making system call...
system() retval = -1; $? = -1; $! = No such file or directory
I searched the archives, and I found an old node that may be related.
I tested this on two different systems, with two different perl version. Some system details:
System 1:
$ perl -v
This is perl, v5.8.8 built for i386-linux-thread-multi
libc=/lib/libc-2.5.so
$ uname -a
Linux hostname.removed 2.6.18-8.1.1.el5 #1 SMP Mon Feb 26 20:38:02 EST
+ 2007 i686 i686 i386 GNU/Linux
System 2:
test$ perl -v
This is perl, v5.8.6 built for i686-linux-64int
libc=/lib/libc-2.3.2.so
test$ uname -a
Linux hostname.removed 2.6.9-55.0.6.ELsmp #1 SMP Tue Sep 4 21:36:00 ED
+T 2007 i686 unknown
I haven't been able to find any documentation about this behavior. Is this intended? If so, how can I trust system's return value?
Re: System call + signals = bad return code?
by papidave (Pilgrim) on Sep 29, 2007 at 00:19 UTC
|
Disclaimer
The following note applies to unix-like system calls only. My mojo does not apply to windows.
As tye apparently explained to you, most SysV-style (and by exension, linux) system calls are interruptible. The wait(2) and waitpid(2) calls are examples.
Since your database connection to MySQL needs to use sockets to communicate with the database, you may receive signals (SIGPOLL is common with asynchronous I/O, for example). Likewise, if you have a second process in the background, it can send a SIGCLD when it exits. And timers send SIGALRM.
The fun part of this is that Unix only stacks one of each signal for calls to the handler -- so your signal handler might only get called once even when two processes die. This doesn't just apply to $SIG{FOO} handlers, it also applies to the implicit handler in wait() calls. I haven't (yet) walked the Perl source to satisfy my curiosity, but I expect that since it uses waitpid(), it's trapping that case reasonably well.
Some test cases I ran using system() and alarm() shows that the $! value sometimes (but not always) gets set to "interrupted system call" when this occurs. It depends on where you are in the system call when the signal arrives, I think. In any event, my general rule is to trust the value of ( $? >> 8 ), which gives the return status of the child process. YMMV, especially if you were to create more than one child process -- i don't know which one would end up in $?.
I have definitely seen the case where the signal arrives and the child process continues running to completion long after the return code is "returned." In C, I avoid the whole thing by calling popen() and reading until end-of-file. I don't know if you can apply that technique to wget in perl using open my $fh, '-|', $cmd or not. | [reply] |
Re: System call + signals = bad return code?
by bruceb3 (Pilgrim) on Sep 28, 2007 at 23:11 UTC
|
This is a very interesting situation. I have coded up the system call into fork/exec solution and ensured that the signal handler and alarm call are only happening within the parent. Even with this the child is still exiting with -1.
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
print "".localtime(),"\n";
my $pid = fork;
die "fork failed\n" if !defined $pid;
if ($pid == 0) { # child
print "child is $$\n";
exec "/bin/sleep", "4";
die "exec failed:$!\n";
}
else { # parent
# there is no code in the parent to kill the child if the alarm is
+ called
$SIG{ALRM} = sub {
print( "Alarm triggered, making system call in $$\n" );
unlink('/doesnt/exist'); # this will definitely fail
};
alarm(2);
my $pid = wait;
my $status = $?;
print "return value from child is $pid and status was $status\n";
}
print "".localtime(),"\n";
The output of this code is-
Sat Sep 29 09:04:00 2007
child is 26171
Alarm triggered, making system call in 26170
return value from child is -1 and status was -1
Sat Sep 29 09:04:02 2007
It seems that the failed unlink is causing the return code from the child to be trashed too. This is just speculation, of course. In the perldocs there is talk of problems with signals and different versions of OS being a contributing factor, so I coded up the original code in C;
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
void alarm_sig(int sig)
{
int rt;
puts("in alarm_sig");
if ((rt = unlink("/doesnt/exist")) == -1) {
perror("unlink failed");
}
}
int main()
{
int rt;
signal(SIGALRM, alarm_sig);
alarm(2);
rt = system("sleep 4");
printf("return value from system %d\n", rt);
}
The output of this is -
in alarm_sig
unlink failed: No such file or directory
return value from system 0
So it doesn't look like it's a problem with my operating system. Need to have a look at this further, later on.
| [reply] [d/l] [select] |
|
So, working on my theory that $? and $! are being trashed because of the call to the signal handler, I localised the variables $? and $! which has made a positive difference. Here is the code.
$SIG{ALRM} = sub {
local $? = 0;
local $! = 0;
print( "Alarm triggered in $$\n" );
unlink('/doesnt/exist'); # this will definitely fail
};
alarm(2);
my $rt = system("sleep 4");
print "system returned $rt, \$? is $?\n";
And the output is -
Alarm triggered in 26346
system returned 0, $? is 0
Give it a try in your code. Let me know how is goes. | [reply] [d/l] [select] |
|
my $interrupted = 0;
$SIG{ALRM} = sub { $interrupted = 1; };
| [reply] [d/l] |
Re: System call + signals = bad return code?
by bluto (Curate) on Sep 28, 2007 at 22:25 UTC
|
system() is a convenience wrapper around fork/exec. Don't expect to use signals with it and have things work well. The classic example of this kind of failure is when someone uses alarm with system: The ALRM occurs, the parent then continues, but the child doesn't receive the ALARM and keeps running (on some platforms at least). I've seen similar cases where system returns an invalid result if I've already forked off an unrelated process and it happens to die when system() is executing.
If you want to trap signals, it's better to do this yourself For example, fork/exec the child and have the parent call waitpid. If the waitpid is interrupted by the alarm, have the parent explicitly kill the child. An alternative is not to use alarm at all, but just to call waitpid about once a second. When the time is up just kill the child process.
Update: Clarified a little | [reply] |
|
I think I'm being unclear in my original post. I don't want the child to receive the signal, nor am I using this alarm as a timeout to make sure the system call doesn't hang.
In the real code, a connection to MySQL is established via the DBI module. Then a wget command is issued that takes a long time. During that time, the database connection times out, and must send a signal to the parent. This causes $? and $! to be set. But in addition to those, the system('wget') call returns -1. I know the wget succeeds, because the file is completely downloaded. But since system() returns -1, it looks like the wget failed!
| [reply] |
|
I was confused by your example. Sorry. This statement though still holds: I've seen similar cases where system returns an invalid result if I've already forked off an unrelated process and it happens to die when system() is executing. (i.e. in addition to signals, process reaping is problematic). You may want to try replacing system() with fork/exec/waitpid. The waitpid will return the status for the child pid you pass to it.
| [reply] |
|
|