Hi,
On Mon, Nov 26, 2018 at 10:41:43PM +0000, Andy Smith wrote:
At approximately 22:24Z, host "hen" rebooted
itself unexpectedly.
This unfortunately has happened again today, at about 14:23Z.
This time I was logging the serial console to a file and so am able
to see that there was the equivalent of a kernel panic in the
hypervisor.
That is, I do not believe that hen's hardware is at fault. I think
it's tripping against a bug in Xen, and it's happened to the same
host twice because it's been triggered by the same guest doing
something (I do not believe malicious at this stage).
I've not got a quick fix to this because moving all customers on hen
to new hardware is likely just going to crash the hypervisor on the
other hardware. I need to discuss the problem with the Xen
developers and see if I get anywhere.
In between last time and this I also built a new version of the
hypervisor and set every host to boot into it, so hen is now
actually running a very slightly newer version than everything else
(and also compared to what it was running before). This possibly
could help, just by chance, though as far as I am aware it is not a
known bug.
So I am very sorry but I am going to have to ask you to bear with me
for a little while, while I investigate this more. Until I can
establish which guest triggered it I can't move any of the customers
on host hen to other hosts because that possibly just triggers it
elsewhere. And it could still elsewhere anyway.
If I don't make headway with this then I can revert to earlier
versions that we've been stable on for a long time, but security
issues have been fixed since then so I'm not going to do that except
as a last resort.
I will provide more information as soon as I can.
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting