Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Not getting the expected result when using eval/alarm

by McDarren (Abbot)
on Jan 08, 2006 at 05:37 UTC ( [id://521792]=perlquestion: print w/replies, xml ) Need Help??

McDarren has asked for the wisdom of the Perl Monks concerning the following question:

Howdy,

I have a script that runs periodically via crond, which connects to a number of remote MySQL databases (around 250 hosts) to gather data. The MySQL connections are made via a ssh tunnel. It all works fine, and has been running for about two years now - except for one problem which I haven't been able to resolve. Occasionally, one or more of the remote hosts will be uncontactable (for whatever reason), and the script just hangs. To get around this, the logical thing to do seemed to simply wrap the initial connection in an eval block, and set a timeout. So I have some code that looks like so:

HOST: foreach $host (@hostlist) { my $timeout = 30; #connect to the remote site Log(LOGFILE, "INFO: $0::Connecting to $host"); eval { local $SIG{ALRM} = sub { die "alarm\n" }; alarm $timeout; $ibisSite=openRemoteSql($host,$mySocket,"$user","$pass"); alarm 0; }; if ($@) { Log(LOGFILE, "ERROR: $0:$host:Cant Connect to site:$@"); next HOST; }

The above is not working as expected though - as the script still hangs whenever it cannot reach any of the remote hosts, and I have to manually kill off the sub-process to get things moving again.

openRemoteSql is a home-grown routine which is called from an external library and establishes the MySQL connection over the ssh tunnel. I'm quitefairly certain that there is nothing wrong with at as it's used sucessfully in several other scripts.

Can anybody see what I'm doing wrong here?

Thanks,
Darren

Update 1: Based on the replies I have so far, and my own niggling suspicions, it seems that perhaps openRemoteSql is the culprit. I've had a closer look at it and it does fork. The problem is that it's inherited code that was written several years ago, and I'm loathe to mess with it as it has dependancies all over the place. But *sigh*.. I may have no choice - unless somebody can suggest some other workaround?

Update 2: Here is a copy of the openRemoteSql routine, and the openRemote routine that it in turn calls...

# # Use openRemote to create a tunnel to the remote mysql server. # Then try to connect to it every 3 seconds for 15 seconds (I made t +hese numbers # up ... they may need to increase for China ! ) # sub openRemoteSql { $hostname = $_[0] ; $port = $_[1] ; $user = $_[2] ; $pass = $_[3] ; # # Look for a free "port" to use $cnt = 0 ; $tmp = -1 ; while( $tmp == -1 ) { if( $cnt > 10 ) { print "can't find a free port in $_[1] .. $port\n" ; return 0; } $port = $_[1] + $cnt ; #print $port , "\n" ; $tmp = openRemote( $hostname,$port,3306,$user,$pass ) ; $cnt ++; } # # if openRemote returns '0' then the ssh failed, probably can't +contact # host or some other shyte # if( $tmp < 1 ) { return $tmp ; } $cnt = 0 ; while( $cnt < 5 ) { $db = DBI->connect("dbi:mysql:database="$remotedb";host=127.0. +0.1;port=$port", "$user", "$pass"); if( $db ) { # # Wow .. it all worked ..... # return $db ; } $cnt ++ ; sleep( 3 ) ; } closeRemote($port) ; return 0 ; }
and...
# # Open the SSH Connection to the site.... this provides us with a tu +nnel for # any other services ! # # Return # -1 Lock File Exists # 0 == Failure # 1 = allOk # sub openRemote { use FileHandle; use IPC::Open2; $hostname = $_[0] ; $lport = $_[1] ; $rport = $_[2] ; $user = $_[3] ; $pass = $_[4] ; $timeout = $_[5] ; # mod 001 add timeout to open ssh if( ! $timeout ) { $timeout = 120 ; } if( ! testSite( $hostname ) ) { return 0 ; } ################################################## # # If the Lock file exists return -1 # ################################################## $lockFile = LockFileName( $lport ) ; if ( -e $lockFile ) { return -1 ; } $g_RemotePort = $lport ; ################################################## # # create the ssh gateway # ################################################## # # Save current alarm handler # $saved = $SIG{ ALRM } ; $SIG{ALRM} = sub { die 'Open Remote : timeout' } ; eval{ alarm( $timeout ) ; $s = " /usr/bin/ssh -T root@" . $hostname . " -L ". $lport ." +:127.0.0.1:" . $rport . " -g " ; $pid = open2( \*SSHRead , \*SSHWrite , $s ) || die "Can't open + a ssh connection to " . $hostname ; $debug = 0; if( $debug == 1 ) { print $hostname . "Opened Connection returned pid = $pid \ +n" ; } # # For some reason the constants are not defined ... # open mode is create/write/exclusive # if file exists .. then this will explode ! # sysopen(WTMP, $lockFile, 0301) || die "Exclusive Access to $lockFile failed" ; print WTMP " kill -9 $pid\n" ; close WTMP ; SSHWrite->autoflush(); # # we execute the following commands so we can tell when the # ssh has "REALLY" connected, and then we can check that we # are on the host we are suppesed to be on .. this is an un- +necessary # step but I don't mind 'cos there'd be real problems if # something went wrong, such as an old process hanging aroun +d # print SSHWrite "echo XXXXX Started OK \n" ; print SSHWrite "echo \$HOSTNAME\n" ; SSHWrite->autoflush(); # # read what is being returned from the "remote end" # timeout processing means that this will die if the # above echos are not returned "pretty quick" # while( 1 ) { $s = <SSHRead> ; if( ! defined($s)) { print "SSH Failure\n"; return 0 ; } if( $s =~ /bind:/ ) { print "Failure $s\n" ; $s = <SSHRead> ; print "$s\n" ; return 0 ; } if( $s =~ /XXXXX Started OK/ ) { $s = <SSHRead> ; return 1 ; } } } ; # # if we timedout then set the pid to 0, code later on then handl +es this # if ($@) { if ( $@ =~ /timeout/ ){ print "FATAL ERROR : Timeout when ssh-ing to $hostname +\n" ; $pid = 0 ; } $globalErrorMessage = $@ ; } # # restore alarm handler # alarm(0) ; if( $saved ) { $SIG{ ALRM } = $saved ; } # # if the connection failed ... then return 0 --> Failure # if( ! $pid ) { closeRemote( $g_RemotePort ) ; return 0 ; } return 1 ; }

Replies are listed 'Best First'.
Re: Not getting the expected result when using eval/alarm
by revdiablo (Prior) on Jan 08, 2006 at 06:30 UTC

    I modified your example so it would work standalone (since you didn't provide your openRemoteSql routine, or example values for @hostlist, I improvised):

    use strict; use warnings; my @hostlist = (1 .. 10); HOST: foreach my $host (@hostlist) { my $timeout = 2; eval { local $SIG{ALRM} = sub { die "alarm\n" }; alarm $timeout; openRemoteSql(); alarm 0; }; if ($@) { print "Timed out\n"; next HOST; } else { print "Didn't time out\n"; } } sub openRemoteSql { if (int rand 2) { print "Blocking\n"; <STDIN>; } }

    I have it randomly block -- simulating a long-running process -- and in those cases, the timeout appears to work as expected. This leads me to believe the code you pasted is fine, and your problem might lie elsewhere. If nothing else, this might help you narrow down the problem further.

    Update: added strict and warnings, modified code to pass

Re: Not getting the expected result when using eval/alarm
by GrandFather (Saint) on Jan 08, 2006 at 07:32 UTC

    The following code:

    use strict; use warnings; my $timeout = 2; eval { local $SIG{ALRM} = sub { die "alarm\n" }; alarm $timeout; here: goto here; printf "Got past the goto!\n"; alarm 0; }; print "ERROR: $@" if $@;

    Prints:

    ERROR: alarm

    which indicates that a simple non-terminated loop is not the problem. Me thinks you need to look closer at what openRemoteSql is doing.


    DWIM is Perl's answer to Gödel
Re: Not getting the expected result when using eval/alarm
by GrandFather (Saint) on Jan 08, 2006 at 11:02 UTC

    I'd be inclined to sprinkle here: goto here; type code through openRemote where you do SSH stuff to see if you can reproduce the error.

    I'd guess though that it's a problem with the SSH stuff. Read the IPC::Open2 docs, especially from the para starting open2() does not wait for and reap the child process .... Following paras warn of deadlock situations that may or may not apply in your case.

    The next debugging step is probably to emit a log line after each SSHWrite/SSHRead operation and see where things are hanging up that way.

    You might change:

    print SSHWrite "echo XXXXX Started OK \n" ; print SSHWrite "echo \$HOSTNAME\n" ; SSHWrite->autoflush();

    to:

    print SSHWrite "echo XXXXX Started OK \necho \$HOSTNAME\n" ; SSHWrite->autoflush();

    or:

    print SSHWrite "echo XXXXX Started OK \n" ; SSHWrite->autoflush(); print SSHWrite "echo \$HOSTNAME\n" ; SSHWrite->autoflush();

    DWIM is Perl's answer to Gödel
Re: Not getting the expected result when using eval/alarm
by JamesNC (Chaplain) on Jan 08, 2006 at 14:33 UTC
    try doing your dbi connect stuff like this:
    eval { $db = DBI->connect("dbi:mysql:database="$remotedb"; host=127.0.0.1;p +ort=$port", "$user", "$pass", { RaiseError=>1, PrintError=>0 } ); }; if ($@){ #handle db error (ie, host not available.. blah blah }else{ return $db; }

    Notice I added {RaiseError=>1, PrintError=>0} to your dbi call and then eval the call to DBI.

    JamesNC
Re: Not getting the expected result when using eval/alarm
by jesuashok (Curate) on Jan 08, 2006 at 06:33 UTC
    Hi

    If you are sure that there is no fork happened in openRemoteSql, there won't be any problem as per the Code.

    But still you need to consider about the Operating System what you are using.

    Since you have specified that crond I assumed myself as that is linux.

    For safer side you can make the "openRemoteSql" to be called from child and get the Pid Status with the alarm time.

    That will help to solve your problem

    Sometimes It will be like The alaram status will be returned by the time the process would have lost its PID.

    That may cause Problem to you.

    "Keep pouring your ideas"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://521792]
Approved by spiritway
Front-paged by monkfan
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-24 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found