Hi,
At around 00:33 BST (23:33Z) we started to receive alerts regarding
services on host "clockwork". Upon investigation it was showing all
the signs of being the intermittent "frozen I/O" problem we've been
having:
https://lists.bitfolk.com/lurker/message/20210425.071102.9d9a1cc5.en.html
As mentioned in that earlier email, I'd decided that the next step
would be prepare new hypervisor packages and I did do that the next
day.
As this issue seems to happen only every few months and on different
servers we do not yet know if the new packages fix the problem.
They've been in use on other servers since late April without
incident, but that isn't yet proof enough given the long periods
between occurrences.
Anyway, after "clockwork" was power cycled the new packages were
installed there and then all VMs were started again. This was
completed by about 01:16 BST (00:16Z).
There are still many of our servers where we know this is going to
happen again at some point. I don't feel comfortable scheduling
maintenance to upgrade them when I don't know if the upgrade will be
effective. If we can go a significant period of time on the newer
version without incident then we will schedule a maintenance window
to get the remaining servers on those versions too. It is also
possible that there will be a security patch that forces a
maintenance, in which case we'll upgrade the hypervisor packages to
the newer version at the same time.
There are also some servers still left to be emptied so that their
operating systems can be upgraded. Those are "hen" and "paradox".
Once they are emptied and upgraded they will of course be put on the
newer version of the hypervisor. It is expected that customer
services we move from these servers will be put on servers that
already have the newer hypervisor version.
Thank you for your patience and I apologise for the disruption. I'm
doing all that I can to try to find a solution.
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting