http://qs321.pair.com?node_id=641620

swkronenfeld has asked for the wisdom of the Perl Monks concerning the following question:

Updated this description: I'm looking at a program in which a connection to MySQL is established via the DBI module. Then a wget command is issued that takes a long time. During that time, the database connection times out, and must send a signal to the parent. This causes $? and $! to be set. But in addition to those, the system('wget') call returns -1. I know the wget succeeds, because the file is completely downloaded. But since system() returns -1, it looks like the wget failed!

Update2: I think Tye figured this out while we were chatting in the CB: system calls waitpid, and waitpid is interrupted by the signal. So my child wget command keeps on executing the background, while the perl script resumes execution.

I wrote a sample program that shows the same problem, but I think that it might be causing confusion. I don't want the child to receive the signal, nor am I using this alarm as a timeout to make sure the system call doesn't hang. My use of alarm w/ system here may be wrong, as mr_mischief pointed out to me. I'm leaving it here though, because I *think* it demonstrates the same problem.

#!/usr/bin/env perl use strict; use warnings; $SIG{ALRM} = sub { print( "Alarm triggered, making system call...\n" ); unlink('/doesnt/exist'); # this will definitely fail }; alarm( 2 ); my $retval = system('sleep 4'); if( $retval == -1 ) { print( "system() retval = $retval; " . '$?' . " = $?; " . '$!' . " + = $!\n" ); } elsif( $retval == 0 ) { print( "system() returned 0\n" ); } else { print( "system() return $retval\n" ); }
And so here is a test run:
test$ ./test.pl Alarm triggered, making system call... system() retval = -1; $? = -1; $! = No such file or directory
I searched the archives, and I found an old node that may be related.

I tested this on two different systems, with two different perl version. Some system details:
System 1: $ perl -v This is perl, v5.8.8 built for i386-linux-thread-multi libc=/lib/libc-2.5.so $ uname -a Linux hostname.removed 2.6.18-8.1.1.el5 #1 SMP Mon Feb 26 20:38:02 EST + 2007 i686 i686 i386 GNU/Linux System 2: test$ perl -v This is perl, v5.8.6 built for i686-linux-64int libc=/lib/libc-2.3.2.so test$ uname -a Linux hostname.removed 2.6.9-55.0.6.ELsmp #1 SMP Tue Sep 4 21:36:00 ED +T 2007 i686 unknown
I haven't been able to find any documentation about this behavior. Is this intended? If so, how can I trust system's return value?

Replies are listed 'Best First'.
Re: System call + signals = bad return code?
by papidave (Pilgrim) on Sep 29, 2007 at 00:19 UTC
    Disclaimer

    The following note applies to unix-like system calls only. My mojo does not apply to windows.

    As tye apparently explained to you, most SysV-style (and by exension, linux) system calls are interruptible. The wait(2) and waitpid(2) calls are examples.

    Since your database connection to MySQL needs to use sockets to communicate with the database, you may receive signals (SIGPOLL is common with asynchronous I/O, for example). Likewise, if you have a second process in the background, it can send a SIGCLD when it exits. And timers send SIGALRM.

    The fun part of this is that Unix only stacks one of each signal for calls to the handler -- so your signal handler might only get called once even when two processes die. This doesn't just apply to $SIG{FOO} handlers, it also applies to the implicit handler in wait() calls. I haven't (yet) walked the Perl source to satisfy my curiosity, but I expect that since it uses waitpid(), it's trapping that case reasonably well.

    Some test cases I ran using system() and alarm() shows that the $! value sometimes (but not always) gets set to "interrupted system call" when this occurs. It depends on where you are in the system call when the signal arrives, I think. In any event, my general rule is to trust the value of ( $? >> 8 ), which gives the return status of the child process. YMMV, especially if you were to create more than one child process -- i don't know which one would end up in $?.

    I have definitely seen the case where the signal arrives and the child process continues running to completion long after the return code is "returned." In C, I avoid the whole thing by calling popen() and reading until end-of-file. I don't know if you can apply that technique to wget in perl using open my $fh, '-|', $cmd or not.

Re: System call + signals = bad return code?
by bruceb3 (Pilgrim) on Sep 28, 2007 at 23:11 UTC
    This is a very interesting situation. I have coded up the system call into fork/exec solution and ensured that the signal handler and alarm call are only happening within the parent. Even with this the child is still exiting with -1.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; print "".localtime(),"\n"; my $pid = fork; die "fork failed\n" if !defined $pid; if ($pid == 0) { # child print "child is $$\n"; exec "/bin/sleep", "4"; die "exec failed:$!\n"; } else { # parent # there is no code in the parent to kill the child if the alarm is + called $SIG{ALRM} = sub { print( "Alarm triggered, making system call in $$\n" ); unlink('/doesnt/exist'); # this will definitely fail }; alarm(2); my $pid = wait; my $status = $?; print "return value from child is $pid and status was $status\n"; } print "".localtime(),"\n";
    The output of this code is-
    Sat Sep 29 09:04:00 2007 child is 26171 Alarm triggered, making system call in 26170 return value from child is -1 and status was -1 Sat Sep 29 09:04:02 2007

    It seems that the failed unlink is causing the return code from the child to be trashed too. This is just speculation, of course. In the perldocs there is talk of problems with signals and different versions of OS being a contributing factor, so I coded up the original code in C;

    #include <stdio.h> #include <stdlib.h> #include <string.h> #include <signal.h> #include <unistd.h> void alarm_sig(int sig) { int rt; puts("in alarm_sig"); if ((rt = unlink("/doesnt/exist")) == -1) { perror("unlink failed"); } } int main() { int rt; signal(SIGALRM, alarm_sig); alarm(2); rt = system("sleep 4"); printf("return value from system %d\n", rt); }

    The output of this is -

    in alarm_sig unlink failed: No such file or directory return value from system 0

    So it doesn't look like it's a problem with my operating system. Need to have a look at this further, later on.

      So, working on my theory that $? and $! are being trashed because of the call to the signal handler, I localised the variables $? and $! which has made a positive difference. Here is the code.
      $SIG{ALRM} = sub { local $? = 0; local $! = 0; print( "Alarm triggered in $$\n" ); unlink('/doesnt/exist'); # this will definitely fail }; alarm(2); my $rt = system("sleep 4"); print "system returned $rt, \$? is $?\n";

      And the output is -

      Alarm triggered in 26346 system returned 0, $? is 0

      Give it a try in your code. Let me know how is goes.

        Doing anything complicated in a signal handler is scary dangerous. The most common problem lies in memory management -- if your code attempts to malloc() a buffer when the main code is already in malloc(), your heap can get corrupted. It won't happen all the time, and it may not even happen often, but it can happen and it's very hard to debug.

        The Signals section in Chapter 16 (IPC) of the Camel explains the rationale for this further -- but the general rule of thumb is that your handler shouldn't do anything more complicated than updating a variable, e.g.

        my $interrupted = 0; $SIG{ALRM} = sub { $interrupted = 1; };
Re: System call + signals = bad return code?
by bluto (Curate) on Sep 28, 2007 at 22:25 UTC
    system() is a convenience wrapper around fork/exec. Don't expect to use signals with it and have things work well. The classic example of this kind of failure is when someone uses alarm with system: The ALRM occurs, the parent then continues, but the child doesn't receive the ALARM and keeps running (on some platforms at least). I've seen similar cases where system returns an invalid result if I've already forked off an unrelated process and it happens to die when system() is executing.

    If you want to trap signals, it's better to do this yourself For example, fork/exec the child and have the parent call waitpid. If the waitpid is interrupted by the alarm, have the parent explicitly kill the child. An alternative is not to use alarm at all, but just to call waitpid about once a second. When the time is up just kill the child process. Update: Clarified a little

      I think I'm being unclear in my original post. I don't want the child to receive the signal, nor am I using this alarm as a timeout to make sure the system call doesn't hang.

      In the real code, a connection to MySQL is established via the DBI module. Then a wget command is issued that takes a long time. During that time, the database connection times out, and must send a signal to the parent. This causes $? and $! to be set. But in addition to those, the system('wget') call returns -1. I know the wget succeeds, because the file is completely downloaded. But since system() returns -1, it looks like the wget failed!
        I was confused by your example. Sorry. This statement though still holds: I've seen similar cases where system returns an invalid result if I've already forked off an unrelated process and it happens to die when system() is executing. (i.e. in addition to signals, process reaping is problematic). You may want to try replacing system() with fork/exec/waitpid. The waitpid will return the status for the child pid you pass to it.