Hi,
On Fri, Oct 23, 2020 at 11:46:11AM +0000, Andy Smith wrote:
> On Fri, Oct 23, 2020 at 11:19:21AM +0000, Andy Smith wrote:
> > I'm trying to isolate the issue to one particular VM because if a
> > guest can crash the host then it's a bug in the hypervisor and
> > just moving guests around won't solve the problem.
>
> I can't find it. As we have had problems with elephant before I'm
> going to assume hardware problem and start moving customer VMs to
> other hosts.
While moving customer VMs to other hosts, booting one of them caused
server "macallan" to crash in exactly the same way. So, I am ruling
out hardware issues with "elephant".
By preventing this particular VM from booting I was able to boot all
of the other VMs on "macallan". I have some hope that it is just
this one VM that is tickling a particularly nasty bug.
I am going to try now starting the remainder of VMs on "elephant".
If that is successful I will then take the suspect VM to test
hardware to see if I can further reproduce.
I am confused because I am sure I tried reverting last weekend's
hypervisor upgrade to the previous version while investigating
matters on "elephant", yet it still crashed. Possibly I made a
mistake (e.g. booted with wrong hypervisor).
Also, everything obviously booted up fine last weekend when I did
the maintenance so possibly this customer has found a new and
unrelated bug.
The best case at this point is that I can reproduce the problem with
just that one VM, report it, get it fixed and then have to reboot
everything to deploy the fix.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
Server "elephant" unexpectedly crashed, then crashed twice more
shortly after rebooting but before completely starting all VPSes. It
is now crashing every time while trying to boot VPSes. I suspected
bug in last round of XSA patches so reverted to previous hypervisor,
but problem persists. We had an issue with "elephant" not so long
ago so it might be hardware fault *though no logs to back this up).
Still investigating, sorry.
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hello,
Unfortunately - and annoyingly only a month since the last lot -
some serious security bugs have been discovered in the Xen
hypervisor and fixes for these have now been pre-disclosed, with an
embargo that ends at 1200Z on 20 October 2020.
As a result we will need to apply these fixes and reboot everything
before that time. We are likely to do this in the early hours of the
morning UK time, on 17, 18 and 19 October.
In the next few days individual emails will be sent out confirming
to you which hour long maintenance window your services are in. The
times will be in UTC; please note that UK is currently observing
daylight savings and as such is currently at UTC+1. We expect the
work to take between 15 and 30 minutes per bare metal host.
If you have opted in to suspend and restore¹ then your VM will be
suspended to storage and restored again after the host it is on is
rebooted. Otherwise your VM will be cleanly shut down and booted
again later.
If you cannot tolerate the downtime then please contact
support(a)bitfolk.com. We may be able to migrate² you to
already-patched hardware before the regular maintenance starts. You
can expect a few tens of seconds of pausing in that case. This
process uses suspend&restore so has the same caveats.
It is disappointing to have another round of security reboots 28
days after the last lot, though before that there was a gap of about
330 days. Still, as there are security implications we have no
choice in the matter.
Cheers,
Andy
¹ https://tools.bitfolk.com/wiki/Suspend_and_restore
² https://tools.bitfolk.com/wiki/Suspend_and_restore#Migration
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
A reminder that maintenance is scheduled for the early hours (UK
time) of 17, 18 and 19 October.
Irritatingly, this may end up having to be postponed. One of the
patches has problems and the vendor is still working on that. If
they come up with something in the next few hours I will still have
time to test it appropriately, but if they don't then I won't and
we'll have to postpone this maintenance for one week.
Please assume it is going ahead unless you are notified otherwise.
You should have all received a direct email telling you the hour
long maintenance window that each of your VMs is in. If you can't
find it please check your spam folders etc; it was sent on 7
October.
If you still can't find it, work out which host machine you're on¹,
and then the maintenance windows are:
elephant 2020-10-17 00:00
hen 2020-10-18 02:00
hobgoblin 2020-10-18 01:00
jack 2020-10-19 00:00
leffe 2020-10-19 01;00
macallan 2020-10-17 02:00
paradox 2020-10-18 00:00
snaps 2020-10-19 02:00
talisker 2020-10-17 03:00
These times are all in UTC so add 1 hour for UK time (BST).
Cheers,
Andy
¹ This is listed on https://panel.bitfolk.com/ and is also evident
from resolving <accountname>.console.bitfolk.com in DNS, e.g.:
$ host ruminant.console.bitfolk.comruminant.console.bitfolk.com is an alias for console.leffe.bitfolk.com.
console.leffe.bitfolk.com is an alias for leffe.bitfolk.com.
leffe.bitfolk.com has address 85.119.80.22
leffe.bitfolk.com has IPv6 address 2001:ba8:0:1f1::d
----- Forwarded message from Andy Smith <andy(a)bitfolk.com> -----
Date: Wed, 7 Oct 2020 09:20:29 +0000
From: Andy Smith <andy(a)bitfolk.com>
To: announce(a)lists.bitfolk.com
Subject: [bitfolk] Reboots will be necessary to address security issues, probably early hours 17/18/19
October
User-Agent: Mutt/1.5.23 (2014-03-12)
Reply-To: users(a)lists.bitfolk.com
Hello,
Unfortunately - and annoyingly only a month since the last lot -
some serious security bugs have been discovered in the Xen
hypervisor and fixes for these have now been pre-disclosed, with an
embargo that ends at 1200Z on 20 October 2020.
As a result we will need to apply these fixes and reboot everything
before that time. We are likely to do this in the early hours of the
morning UK time, on 17, 18 and 19 October.
In the next few days individual emails will be sent out confirming
to you which hour long maintenance window your services are in. The
times will be in UTC; please note that UK is currently observing
daylight savings and as such is currently at UTC+1. We expect the
work to take between 15 and 30 minutes per bare metal host.
If you have opted in to suspend and restore¹ then your VM will be
suspended to storage and restored again after the host it is on is
rebooted. Otherwise your VM will be cleanly shut down and booted
again later.
If you cannot tolerate the downtime then please contact
support(a)bitfolk.com. We may be able to migrate² you to
already-patched hardware before the regular maintenance starts. You
can expect a few tens of seconds of pausing in that case. This
process uses suspend&restore so has the same caveats.
It is disappointing to have another round of security reboots 28
days after the last lot, though before that there was a gap of about
330 days. Still, as there are security implications we have no
choice in the matter.
Cheers,
Andy
¹ https://tools.bitfolk.com/wiki/Suspend_and_restore
² https://tools.bitfolk.com/wiki/Suspend_and_restore#Migration
--
https://bitfolk.com/ -- No-nonsense VPS hosting
----- End forwarded message -----