http://qs321.pair.com?node_id=203421

Flexx has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

be advised: this is not directly Perl-related. Don't bother to read on if you don't like offtopic posts...

In one of my projects I use File::Rsync to access multiple remote repositories which hold the same data. If syncing with one repository fails (i. e. if rsync return with an error), I'll try the next repository in a prioritized list.

The problem is that I cannot control the connect timeout rsync uses. It's got a --timeout option, but it jumps in only for I/O timeouts (post-connect, that is). Now if a server is not reachable, or under heavy load, it takes 75 seconds until the connection attempt times out (on my FreeBSD 3.1 box). I'd rather have it give up sooner and try another server instead.

Read on to see what I have come up with:

I thought of patching rsync to allow for a --connect-timeout option, however, according to my linux glibc docs, the underlying connect call doesn't seem to allow for a connect timeout either. Another thing I could do would be to start rsync without File::Rsync (as I did before), detach it, wait timeout seconds, and kill the child afterwards. However, how would I know wether the child process still tries to connect, or is already doing fine downloading stuff?

I could also ping the server, to see if that's alive, yet I had situations where the rsync daemon died, but pings still succeeded, so this is not really a solution.

I'd actually need to mimic rsync's protocol and try to connect to the remote rsyncd using pure Perl and maybe setting alarms to bail out after 10 seconds... But that's not really a very low effort task to complete, is it? And when I read perlipc correctly, it's not really safe to use signal handlers either...

On my Linux box, connect looks like this:

int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);

No timeout here... :(

My man 7 socket tells me:

SO_RCVTIMEO and SO_SNDTIMEO
Specify the sending or receiving timeouts until reporting an error. They are fixed to a protocol specific setting in Linux and cannot be read or written. Their functionality can be emulated using alarm(2) or setitimer(2).

Hmm.. So I'm back with an alarm/signal solution.

Does the timeout method of IO::Socket handle things differently (as far as I see, it's only applied to reads/writes, but again not when connecting)?

In my understanding, any TCP client is affected by this, unless it implements it's own connect timeout routine... For example my rlogin doesn't provide a connect timeout option either, so if I wanted that, I had the same problems...

I was going to look at some HTTP clients (like wget), to see how they implement their connect timeouts, and maybe going to patch rsync accordingly...

Is there any elegant/cookbook solution I fail to see? Any help/pointers is appreciated!

So long,
Flexx