Hi,
Short version:
faustino had a kernel panic that seems related to filesystem
corruption, was power cycled, fscked and checked, VPSes started again.
Long version:
At approximately 0250Z this morning I was alerted that an
infrastructure VPS on host faustino was not responding. On
investigation it had crashed with a kernel error on the host machine
itself. Attempting to restart the VPS caused more kernel errors and
eventually a lock up, so it was necessary to power cycle the host.
After the host booted I carried out a little bit more investigation
before starting VPSes again. It seems that the host encountered a
filesystem error in its /var filesystem which the xend process was
writing to, which in turn crashed one of the VPSes (our
infrastructure one). On boot, the /var device had undergone some fsck
repair.
I forced a fsck of all filesystems on the host (i.e. ours, not
yours) and they all came back clean. I then started the
infrastructure VPS which had crashed before, and this started up
without issue.
I then, at ~0306Z, issued the command to start up all customer
VPSes, and this is still taking place. In fact it just finished as
of ~0321Z. System load will be heavy for a while as every VPS will
be doing its own fsck.
I hope that the root cause of this issue was filesystem corruption,
and it is now behind us. It could be a few other things though. The
RAM was replaced on 26th February and could be at fault. It could
also be a problem with the RAID controller.
There's no real evidence of any of that yet, so I'm going to have to
just keep an eye on things. However if further problems present
themselves then we do have a spare server almost identical to
faustino which we will swap the disks into, or replace other parts
if a clear culprit is suggested.
Please accept my apologies for the disruption.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting