I had a similar issue once, where a program would mysteriously fail, mostly when I wasn't looking. What I did, for my case, was to bring up the perl debugger, and watch each line run against a copy of the funky system.
Turned out some chucklehead used signals for detecting timeouts without ever resetting the signal properly... so if the process was fast enough, the new signal handler would go away when the process terminated. Which was quite the case on a dev system.
Use a perl debugger, watch the lines run one by one, and I'd get a signal handler go BOING!.