Hi,
On Fri, Nov 23, 2018 at 11:41:23AM +0000, Andy Smith wrote:
Anyway, it is my goal for today and the weekend to
isolate the
problem so I would ask if you can bear with me during that time and
then I will update you all.
TL;DR:
My reproducer guest is Debian stretch and was running a
non-updated kernel. After nearly a week of research I finally
thought to check if a kernel update was available. The problem goes
away on that guest with an up to date kernel.
If you are experiencing these issues, you may want to check that you
have the latest kernel update. It should be the one that includes
fixes for L1TF (CVE-2018-3620). On Debian stretch that is
linux-image-4.9.0-8-amd64 version 4.9.130-2 or later. I can't
guarantee that fixes it, but it appears to for my sample size of 1.
Dom, what distribution / kernel are you running? If there is an
available kernel update please could you apply it and see if the
problem goes away?
The details:
I've narrowed the problem down to XSA-273:
https://xenbits.xen.org/xsa/advisory-273.html
These are hypervisor-side fixes for the L1TF vulnerability
(CVE-2018-3620). Basically when the guest does something naughty,
the hypervisor fiddles with its memory to stop it.
If I disable those fixes by booting the hypervisor with
"pv-l1tf=false" then the problems go away. But those fixes are
important, so I can't just disable them. It is a potential
"malicious guest can read all machine RAM" scenario.
Those fixes are only necessary if your guest does something naughty.
It was only at this point, after 5 days of research, that I thought
to check if my reproducer guest actually did have an up to date
kernel. It did not. It is Debian stretch and was running
linux-image-4.9.0-7-amd64 version 4.9.110-3+deb9u2.
As you can see from:
https://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_4.9.1…
the very next kernel package update on 19 August contained this:
linux (4.9.110-3+deb9u3) stretch-security; urgency=high
[ Salvatore Bonaccorso ]
* Add L1 Terminal Fault fixes (CVE-2018-3620, CVE-2018-3646)
After booting into an updated kernel, I can no longer reproduce the
problem on this guest. I believe if you install a kernel with
CVE-2018-3620 fixed then there's a good chance your problems go
away. If you have been experiencing problems then please can you do
this and let me know how it goes? Or if you need help with that,
also please let me know.
As far as I know, the following packages contain these fixes:
Debian jessie: linux 3.16.59-1
Debian stretch: linux 4.9.110-3+deb9u6
Debian testing: linux 4.18.10-2
Ubuntu 14.04: linux 3.13.0-155.205
Ubuntu 16.04: linux 4.4.0-133.159
Ubuntu 18.04: linux 4.15.0-32.35
CentOS 6.x: kernel 2.6.32-754.3.5.el6
CentOS 7.x: kernel 3.10.0-862.11.6.el7
Upstream linux: 4.19~rc1
If you are for whatever reason unfortunate enough to be running an
unsupported kernel that doesn't contain this fix then that is very
bad news for you, because the Xen part only stops your guest from
reading the memory of Xen and other guests. Without a fix as
supplied by your OS vendor, a malicious process within your guest
could potentially read any of the RAM of your guest. So I would
recommend that everyone does upgrade their kernel regardless.
If this problem is resolved by a kernel update then we probably
won't seek to roll out another hypervisor upgrade.
I think there must still be a bug in Xen to cause this subtle memory
corruption, but I will need assistance from the Xen developers to
debug that and they may not be interested in doing so if a guest
kernel upgrade avoids it.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting