Hi Dom,
On Fri, Nov 23, 2018 at 09:02:41AM +0000, Dom Latter wrote:
On 18/11/2018 02:58, Andy Smith wrote:
Today's maintenance work is now complete, and went without issue for
hosts "paradox" and "hen", but unfortunately during the boot up of
host "hobgoblin" the hypervisor encountered a bug and crashed.
Since then, my VPS on hobgoblin has been having memory problems when
running spamassassin (examples given below).
Since the security patching I've received one report of similar
problems (on another host), in that case involving php-fpm
segfaulting, and that is easy to replicate. I've also seen it myself
on a handful of occasions on BitFolk's own infrastructure VMs,
though in these cases it's not easy to reproduce.
Today another customer has sent in a support ticket saying that are
seeing similar things.
So, although it was initially difficult to believe that the
hypervisor patching could have introduced this problem, and it is
still very hard to notice or reproduce (a handful of instances
across the entire customer base) it now seems highly unlikely that
it would be anything else.
The changes that were made during the patching include three
security patches and a CPU firmware upgrade.
I am working with the customer that is reliably seeing the problem
with php-fpm, because it's easy to reproduce and thus hopefully,
easy to see when/if it is fixed.
What I have to do is:
- Move their VPS to a test host (running same hypervisor as
everything else) and verify the problem still occurs.
- Rebuild the hypervisor on that host without any of the patches,
reboot it and see if the problem still occurs.
- Hopefully that got it, so then I will go back to the full patch
set and back them out one by one until the problem stops
happening.
Assuming I do identify which patch it is, as these are security
patches I can't unfortunately just not apply it. I will have to work
with the Xen project on it and make a decision about what to do
while that is ongoing.
All of this is a little bit time consuming, and the ability to
reproduce is key so I will exclusively be working with the customer
who can easily reproduce it in php-fpm. It also helps that this
customer is able to take their VPS out of service and let me
experiment with it without worrying about disrupting their usual
business.
How disruptive is this for you? Does it make spamassassin completely
unusable? Can you easily reproduce it with spamassassin (or anything
else)?
If I do isolate the problem I can move your VPS to the test host,
but as it is a test host I may need to do further work and reboot
it, so maybe you are better off where you are.
Anyway, it is my goal for today and the weekend to isolate the
problem so I would ask if you can bear with me during that time and
then I will update you all.
I am about to run aptitude upgrade but meanwhile -
anybody else
having problems on hobgoblin?
It's not restricted to hobgoblin and I do not think upgrading
anything will help you as others have experienced this on 64-bit
Debian stretch.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting