in reply to Executing Systems Calls with Care

Rewriting all the remote calls to use Net::Ftp and Net::Telnet would be a lot of work. Using sockets directly would be even more (and unnecessary) work.

To fix the deadlock problem, the easiest fix would be IMHO to go through the code line by line, determine which calls might hand due to problems on the remote system, and wrap such calls in

eval { local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required alarm($timeout); # # do stuff that might time out # alarm 0; }; if ($@) { die unless $@ eq "alarm\n"; # propagate unexpected errors # handle the timed out operation } else { # operation didn't time out, handle it's result }
Once you've done that, try to figure out what kind of problems on the monitoring machine you need to report to the team, and how you're going to notice if the monitoring machine silently fails.