Hi,
A reminder that if you have a 32-bit Debian guest that you are
keeping up to date:
The Linux kernel removed support for 32-bit PV guests at version
5.9, so it will not be possible for you to upgrade from Debian
10 (buster) to 11 (bullseye) without taking action.
This was mentioned before, but since then there have been a few
casualties anyway (seemingly-unbootable guests).
https://lists.bitfolk.com/lurker/message/20210930.104643.2ab5f9c0.en.html
As you can see, as long as you are running a kernel above 4.19.0 and
are using grub-pc to boot, you can just switch to PVH mode.
This isn't an issue for Ubuntu because no 32-bit support there at
all for some time, nor for CentOS because no upgrades between major
releases there either.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
You may recall that all through the first half of 2021 we've been
moving customer services off of certain servers in order to upgrade
the servers and put them back in service. That effort ground to
a halt in June because of other more pressing concerns. We're now
starting that up again to finish the job.
We sent notification emails to everyone who would be affected, but
this was back in June so you may have forgotten. These went to
customers on servers "hen" and "paradox", which are the last two
servers that need upgrade.
That notification email asked you to let us know if you need more
than 5 minutes of notice for the work to be done. If you did reply
to that, don't worry, we still have records of that and will give
you the amount of notice you asked for.
If you didn't reply then we are still assuming that 5 minutes of
notice at any time of day is fine and that's how we'll be proceeding
over the next couple of weeks. If that situation has changed then
you should look for the original notification email and reply to it
with your needs.
The last batch of notification emails were sent out to customers on
"hen" and "paradox" on Saturday 5 June 2021 with subject line:
We need to move your BitFolk VPS '$accountname' to other hardware
If you can't find it but still need to let us know, just email
support(a)bitfolk.com to open a support ticket. Again, this only
affects customers on servers "hen" and "paradox". Here's how to work
out which server your service is on:
https://bitfolk.com/customer_information.html#toc_3_Which_piece_of_actual_h…
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
Yesterday evening and again this morning we've had two customers try
to upgrade their 32-bit Debian 10 VMs to Debian 11 and end up with
something that doesn't boot. This is because the Linux kernel
stopped supporting 32-bit Xen PV domains at version 5.9.
The quick workaround for those on Debian 10:
xen shell> virtmode pvh
xen shell> boot
We have talked a lot about this over on the "users" list over the
years, and for a while now the default at BitFolk has been 64-bit,
PVH mode guests, but we can't switch existing customers over to PVH
mode because it requires at least kernel version 3.19 and we don't
know what kernels you're running. So existing customers have been
left to switch on their own.
Switching to PVH mode will for now allow you to continue to run
32-bit VMs. However, aside from this, 32-bit Linux has been in
decline for some time and it's know to be less performant and less
secure than 64-bit. So the time has already passed where you should
be planning your switch to 64-bit.
== Just switching your kernel ==
Most of the advantage is to be gained by just switching the kernel,
so those running Debian could do that as Debian has good support for
this.
1. Upgrade to Debian 10 (buster)
2. Follow these instructions only up to and including the "Install a
kernel that supports both architectures in userland" step.
3. Connect to your Xen Shell
4. Shut down, boot, select the new amd64 kernel
xen shell> shutdown
xen shell> boot
If for any reason this does not work, just boot again and select
your previous i686 kernel again.
We suggest doing this in the Xen Shell so you can interact with
the boot process because the new amd64 kernel may not be listed
first in your bootloader.
5. Once satisfied that your amd64 kernel works you can remove the
i686 kernel packages.
Debian will take care of providing you with amd64 kernel updates in
future.
If you haven't already done so you should consider switching to PVH
mode now as well.
We do not recommend trying to fully cross-grade your operating
system to 64-bit unless you are an expert.
== Reinstall ==
You can do a reinstall in place yourself:
https://tools.bitfolk.com/wiki/Using_the_self-serve_net_installer
Don't forget to first switch your architecture to 64-bit and your
virtmode to PVH:
xen shelll> arch x86_64
xen shell> virtmode pvh
as these are the modern defaults.
We can also offer a new account free for two weeks for you to
install into and move your things over.
https://tools.bitfolk.com/wiki/Migrating_to_a_new_VPS
== PVH mode? ==
Which virtualisation mode you use is rather something we don't
expect customers to have to worry too much about, so new customers
have been in PVH mode for some time and haven't had to think about
it, but existing customers will need to make the change at some
point.
Anyone with a kernel that's 4.19 or newer should be able to switch
to it. Here's more info:
https://tools.bitfolk.com/wiki/PVH
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
TL;DR: We turned off suspend+restore for everyone. We think it is
okay for you to re-enable it as long as you use kernel 4.2 or newer
(released 6 years ago), but can't tell what kernel you're running so
erred on the side of caution. We continue to use it for our own VMs.
More detail:
We've just opted you all out of suspend+restore because of the
filesystem corruption that afflicted 2 customer VMs during the
maintenance in August. There were 83 customer VMs that previously
had opted in.
While investigating that we did of course not do any suspend+restore
anyway. I am now satisfied that we know why it happened and under
what circumstances it should be safe to use it again, but as a
precaution we have opted everyone out of it so you can make your own
decisions.
A direct email has gone out to the main contact for each VM that had
previously opted in to this. That email contains far more detail. If
you think you had opted in to suspend+restore but don't see that
email please check your spam folders etc (and then mark it as "not
spam" if necessary!).
You can see the current setting (or opt back in) here:
https://panel.bitfolk.com/account/config/#prefs
You can read more about suspend+restore here:
https://tools.bitfolk.com/wiki/Suspend_and_restore
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
We began receiving alerts at approximately 03:02Z today that host
"macallan" was unresponsive.
There was nothing interesting on its serial console. Its console
also did not respond. The out of band access to the BMC worked but
didn't show anything unusual. There were no hardware events logged.
In the face of a hard lock up all I could do was power cycle it.
All customer VMs were booted again by about 03:30Z.
I'll be keeping a close eye on this server. If this repeats then we
may have to move customers off of it at speed and with little
notice.
Apologies for the disruption this has caused.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hello,
Unfortunately some serious security bugs have been discovered in the
Xen hypervisor and fixes for these have now been pre-disclosed, with
an embargo that ends at 1200Z on 25 August 2021.
As a result we will need to apply these fixes and reboot everything
before that time. We are likely to do this in the early hours of the
morning UK time, on Tuesday 24 and Wednesday 25 August.
In the next few days individual emails will be sent out confirming
to you which hour long maintenance window your services are in. The
times will be in UTC; please note that UK is currently observing
daylight savings and as such is currently at UTC+1.
We expect the work to take between 15 and 45 minutes per bare metal
host. We are going to take the opportunity to complete upgrading the
kernel and hypervisor on some of the hosts that haven't had that
done yet, which is why the work may take a few minutes more for some
hosts.
There are two hosts left that we are trying to migrate customers off
of ("hen" and "paradox"). That was supposed to be done by now but
that effort has been hampered by the other issues we've been having
and is dragging on. We don't intend to patch or reboot those two
hosts, instead mitigating issues with configuration and renewing
efforts to clear customers off of them. If you are concerned about
that we will be happy to move your service as a priority.
If you have opted in to suspend and restore¹ then your VM will be
suspended to storage and restored again after the host it is on is
rebooted. Otherwise your VM will be cleanly shut down and booted
again later.
If you cannot tolerate the downtime then please contact
support(a)bitfolk.com. We will be able to migrate² you to
already-patched hardware before the regular maintenance starts, at a
time of your choosing. You can expect a few tens of seconds of
pausing in that case. This process uses suspend&restore so has the
same caveats.
Thanks,
Andy
¹ https://tools.bitfolk.com/wiki/Suspend_and_restore
² https://tools.bitfolk.com/wiki/Suspend_and_restore#Migration
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
The new stable release of Debian, bullseye, was released over the
weekend:
https://bits.debian.org/2021/08/bullseye-released.html
This is now supported for self-install:
https://tools.bitfolk.com/wiki/Using_the_self-serve_net_installer#Debian
by doing "install debian_bullseye", and also of course as a new
order.
If upgrading in-place from buster to bullseye please make sure to
read the release notes as there are a few things to be aware of:
https://www.debian.org/releases/stable/amd64/release-notes/ch-upgrading.en.…
There aren't any known BitFolk-specific issues with bullseye, though
we do suggest that if you're running buster or beyond that you do so
in PVH mode:
https://tools.bitfolk.com/wiki/PVH
I *think* it is still possible to install the new testing release
(bookworm) by doing "install debian_testing" but we have to check
that and fix it if necessary, which will happen later this week.
If you're desperate to do a clean install of Debian testing and you
find that "install debian_testing" doesn't work then I recommend
installing bullseye and doing an in-place upgrade from there (will
be almost identical right now).
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
At about 19:30Z we started receiving alerts for customer services on
server "limoncello".
On investigation it quickly became apparent that this was the
intermittent "I/O stall" problem we've been seeing on all servers
and have been grappling with for months now.
All I could do was power cycle the server.
My current line of investigation is to upgrade both the hypervisor
and the kernel when this happens, and so far it hasn't reoccurred on
any of the servers where that has been done, though the sometimes
months long gap between incidents means it's not possible to be
sure.
Although this last happened 16 days ago, that was on a different
server ("jack").
With the upgrades done the server was rebooted again and at about
20:28Z customer VMs started booting again. This was complete by
about 20:45Z.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
At about 05:23Z we started receiving alerts for customer services on
server "jack". There had been some alerts for about 40 minutes
before that, but they weren't serious enough to send push
notifications, only emails.
On investigation it quickly became apparent that this was the
intermittent "I/O stall" problem we've been seeing on all servers
and have been grappling with for months now.
All I could do was power cycle the server, which happened at about
05:30Z.
My current line of investigation is to upgrade both the hypervisor
and the kernel when this happens, and so far it hasn't reoccurred on
any of the servers where that has been done, though the sometimes
months long gap between incidents means it's not possible to be
sure.
With the upgrades done, the server was rebooted again and at about
05:54Z customer VMs started booting again. This was complete by
about 06:08Z.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
At about 20:37Z we started receiving alerts for customer services on
server "hobgoblin". It quickly became apparent that this was the
intermittent "I/O stall" problem we've been seeing on all servers
and have been grappling with for months now.
All I could do was power cycle the server, which happened at about
20:58Z.
We're still not able to reproduce the problem on demand and it can
be several months between incidents. We've tried upgrading
hypervisor and that's not helped. It's looking more like a problem
in the Linux kernel. So, I upgraded that as well to a newer
self-made package.
I've been communicating with a couple of the linux-raid devs and we
have some ideas but gathering information and making changes is
going slowly because of the lack of reproducibility and long time
between incidents. It's basically a case of making a single change
any time there is an issue.
With the upgrades done, the server was rebooted again and at about
21:19Z customer VMs started booting again. This was complete by
about 21:33Z.
Obviously I am not happy with these outages and I'm doing everything
I can to find the root cause.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting