http://qs321.pair.com?node_id=748953


in reply to Fetching Madness

It might be interesting to know how it hangs - does the CPU usage go up to 100%? or is it waiting for some IO? Maybe network or DB connection troubles?

You could try to start it with strace (or whatever the equivalent is on your platform) or attach a debugger to it to gather some more informations.

Replies are listed 'Best First'.
Re^2: Fetching Madness
by jlongino (Parson) on Mar 06, 2009 at 21:21 UTC
    CPU usage is minimal, it appear to be waiting for more data from the Oracle box. We are working with the administrators to determine what is going on from their side. I believe it is somehow connected to firewall issues.

    I've used truss and it shows that the last thing issued was a read. The point where it freezes looks like this:

    time() = 1236374108 write(2, " [ F r i M a r 6 ".., 70) = 70 write(8, "02 T\0\006\0\0\0\0\003 ^".., 596) = 596 read(8, 0x003A17C6, 2064) (sleeping...) read(8, "07DB\0\006\0\0\0\0\010\0".., 2064) = 1380 read(8, "\0\0\0\006 H a r r i s\0".., 631) = 631 brk(0x003AC710) = 0 brk(0x003B2710) = 0 read(8, "07DB\0\006\0\0\0\0\0\0FF".., 2064) = 2011 read(8, 0x003A17C6, 2064) (sleeping...)

    The '(sleeping...)' line is always the last thing printed.

    We've ruled out the IPS (Intrusion Prevention System) because we shut it completely off with no apparent change in behavior. We've also snooped the Perl box and it appears to confirm that the program is waiting on read results.

    Although I am worried about the cause, I'm more concerned now with finding a methodology that will terminate the program gracefully and alert us to the problem. Even if we do determine the source of the problem (I'm fairly certain we will), we'll still have the threat of similar problems in the future.