On Wed, May 30, 2012 at 10:27:51PM +0000, Andy Smith wrote:
Having forcibly re-assembled the array we are now at
the stage where
I have access to all of the devices, but they may contain damaged
data. I am taking a backup of the usr and var data as it is now,
before I will fsck them (fsck -n already says we are in for a bumpy
but probably not catastropic ride). If I manage to get the actual
server up then I will report on what we will do next.
barbar itself has been up for a little while now. Damage to its /usr
and /var filesystems appear not to have been too severe.
I have also (with permission) managed to fsck two customer's
filesystems and boot their VPSes. One of them appeared to have only
minimal damage (about what you would expect from a hard power
cycle), the other completed an fsck without incident.
At the moment, every other customer on barbar is administratively
locked out of their Xen Shell in order to prevent them from starting
their VPSes and going straight into an fsck, as they may not have
followed all of this and may be unaware of the potential scale of
the problem.
What we're going to do:
For each customer hosted on barbar whose VPS is still down, I will:
- Run an fsck -n on their block devices provided they are ext3.
IFF that fsck -n returns cleanly:
- Start customer's VPS
OTHERWISE:
- Put a warning message in place in the Xen Shell directing
customers to the URL of this archived email.
- Take a backup copy of the customer's block devices
- Open a support ticket with the customer using the email
address we have on file for them
This support ticket will say something along the lines of:
Wah wah sky has fallen, etc.¹ As a result your VPS is not
currently running. When it IS started up, either by you or
by us, it will very very likely need to have an fsck run and
proceed to do this during the boot process.
Many block devices will have corruption and doing an fsck
could possibly make this worse. Therefore we need you to
reply to this support ticket to let us know that you've read
and understood the situation and are ready to either boot
your VPS yourself or happy to have us do it for you.
Realistically, completing an fsck is the only way that your
filesystems are going to get into a state where your VPS
will run, so this will be necessary at some point; we just
don't want to take the decision out of your hands.
If we do not hear from you in at least 24hours then we
will re-enable your login to your Xen Shell console and
leave you to boot your VPS yourself in your own time.
- Customers who haven't been heard from in at least 24 hours
will have their Xen Shell access restored so that they can
boot their VPSes in their own time. VPSes will not be started
for these customers without their prior permission.
There will be more communications on follow-up actions at a later
date.
Cheers,
Andy
¹ Replace this with a fuller description of the evening's events,
with links to the archives etc.
--
http://bitfolk.com/ -- No-nonsense VPS hosting