Re: Errors in Worst Nodes - Leap second bug?

by flexvault (Monsignor)
on Jul 01, 2012 at 13:02 UTC

in reply to Errors in Worst Nodes

thomas895 and all Monks,

The following information concerning last night's (for me - EDT) leap second bug, I received from

Subject: [Re:][outages]Java apps around the globe are crashing... Sender: Date: 06/30/12 10:20 PM . . . However, it does not seem to be a Java bug -- so far, it looks like something is causing futex() to timeout, instead of telling the thread to sleep [1], causing issues on anything that uses it (e.g., java, chrome, mysql). It's not clear exactly what variable (i.e., kernel verson, distro) causes boxes to go haywire. It may just be a race condition which some people hit due to bad luck. But it is certainly related to the leap second. [1]

Checking our locations, many 1st and 2nd level ntpd servers are down. Following the thread, it doesn't seem that the same 'fix' works for every one. Some were able to shut down ntpd and restart, others had to reboot, and for some machines even after reboot, they aren't working correctly.

Our ntpd servers were not affected, but many source servers are off-line.

To sum it up, if getting a few error messages is the worst that PM has, I think the site did very well.


Another day in the computer age!

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re^2: Errors in Worst Nodes - Leap second bug?
by ww (Archbishop) on Jul 01, 2012 at 17:58 UTC
    Wonderful example of "complicated systems break" (and, since I assume there's no intention of attributing PM issues to the leap second, ++).

