Hi,
On Sun, Jan 05, 2014 at 07:52:32PM +0000, Andy Smith wrote:
I apologise for the disruption and I hope to be able
to give more
information later. I will follow up again when all customr VPSes
are known to have booted.
All customers VPSes are believed to have booted as of about 2050Z.
If yours hasn't, please check out its console. Our Nagios thinks
that everything that was up before is back up again now.
What appears to have happened is that a customer VPS earlier this
afternoon was rebooted while under extreme memory pressure (it was
OOM-killing a lot) and the slow shutdown of that appears to have hit
a race condition in the host kernel which lead to the xenwatch
kernel thread being left in an uninterruptible 'D' state.
In that state, the host was unable to create any further virtual
network devices, so this customer could not complete their reboot
nor could they launch the rescue environment. As no network devices
could be created, no VPS could in fact be started which is why a
full reboot of the host was necessary.
This appears to be a known bug and there is probably a fix for it
that we can apply, but I did not want to do that in the middle of
this semi-emergency reboot.
This kernel version is in use on several hosts with combined uptimes
of thousands of days so I do not think this is a commonly-hit bug
and I would rather take a little bit of time to research the fix and
then schedule a reboot for kernel upgrade with plenty of notice.
I will keep you informed about that.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting