Hello,
On Mon, Oct 05, 2009 at 03:38:00AM +0000, Andy Smith wrote:
The work is finished now and seems to have gone well.
faustino was
shut down at 0100Z and everything was back up by 0255Z.
Again just under a week later, faustino locked up and had to be
power cycled. With absolutely no log traces or entries in the
hardware event logging I can't be 100% sure but I have to assume
that the previous problem has followed the software, despite there
being all new hardware, therefore I have to conclude that an
operating system reinstall is required (which would also be an
upgrade).
This procedure is not yet fully automated for a live server and I
suspect would probably take on the order of 4 hours, it's never been
attempted before and there is potential for a mistake that could
lose the existing data.
Another alternative is to move every VPS (but not the host OS
itself) to the newest server which was just provisioned yesterday --
barbar. This would eventually leave faustino empty so it could be
reinstalled and soak tested at leisure. This would be far more
work, though would involve less downtime to individual customers.
I'm going to put some more thought into this and will follow up
later today with what the plan is going to be. Rushing at this
stage could lead to data loss and you all reinstalling from the
backups (that I'm sure you keep), which would be worse than having
to power cycle once a week. If I decide to do a reinstall, that
would likely take place mid week. If it's to be a move, that would
start very soon although may take all week to get through everyone.
There may yet be other solutions.
Apologies again for the problems with faustino. Naturally my
promise of service credit equal to the number of days the problem
has persisted still applies and is still counting up.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting