Hi,
Between about 20:25Z and ~20:50Z today host "Jack" lost all
networking. All of the VMs on it became unreachable.
It seems to have been some sort of kernel driver bug in the
Ethernet module as it was "stuck" not passing traffic but the
interface still showed as up.
The hosts have bonded network interfaces to protect against switch
failure, but as the interface stayed up this was not considered
failed. Also they are in active-backup mode and the currently-active
interface was the one that was stuck, so all traffic was trying to
go that way.
Networking was restored by setting the link down and up again.
Traffic started to flow again, BGP sessions re-established and all
was fine again.
We could look into some sort of link keepalive method on the bonded
interfaces as opposed to just relying on link state, but we have
already decided to move away from bonded networking in favour of
separate BGP sessions on each interface, That is how the next new
servers will be deployed; they will not have network bonding. We
have not yet tackled moving existing servers to this setup.
If we had been in the situation without bonding I think we would
have fared better here: there would have been a short blip while one
BGP session went down, but the other would remain and we'd be left
with some alerting and me scratching my head wondering why an
interface that is up doesn't pass traffic.
I will do some more investigation of this failure mode but in light
of doing away with bonding being the direction we are already going,
I don't think I want to alter how bonding is done on what will soon
be a legacy setup.
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
An unauthenticated remote root exploit has been discovered in SSH,
including in versions shipped by Debian stable and newer, and most
other up to date Linux distributions.
https://security-tracker.debian.org/tracker/CVE-2024-6387
Please make sure you have applied the necessary upgrades.
If for some reason you are unable to apply an upgrade, the issue can
be mitigated by setting LoginGraceTime to 0 in /etc/ssh/sshd_config.
This will make it easier for people to tie up all connection slots,
denying access to legitimate connections, but does avoid the remote
root exploit.
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting