Hi,
Between about 20:25Z and ~20:50Z today host "Jack" lost all
networking. All of the VMs on it became unreachable.
It seems to have been some sort of kernel driver bug in the
Ethernet module as it was "stuck" not passing traffic but the
interface still showed as up.
The hosts have bonded network interfaces to protect against switch
failure, but as the interface stayed up this was not considered
failed. Also they are in active-backup mode and the currently-active
interface was the one that was stuck, so all traffic was trying to
go that way.
Networking was restored by setting the link down and up again.
Traffic started to flow again, BGP sessions re-established and all
was fine again.
We could look into some sort of link keepalive method on the bonded
interfaces as opposed to just relying on link state, but we have
already decided to move away from bonded networking in favour of
separate BGP sessions on each interface, That is how the next new
servers will be deployed; they will not have network bonding. We
have not yet tackled moving existing servers to this setup.
If we had been in the situation without bonding I think we would
have fared better here: there would have been a short blip while one
BGP session went down, but the other would remain and we'd be left
with some alerting and me scratching my head wondering why an
interface that is up doesn't pass traffic.
I will do some more investigation of this failure mode but in light
of doing away with bonding being the direction we are already going,
I don't think I want to alter how bonding is done on what will soon
be a legacy setup.
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hey all,
Good news for BitFolk here -- I have submitted 7 additions to the new JSfree directory in the past few days, and... today, Brad there has accepted and added 5 of them into his directory -- including BitFolk! :-D
My other suggestions that have been added are W3M, Lynx, Links, and Wttr.in . :-D As of yet, the Viewable With Any Browser Campaign (which now links to JSfree.org since I suggested it to Cari there earlier this month) and Andrews & Arnold (which works without JavaScript except for its ordering pages, which can be bypassed by email or phone) have not yet been added but also not been rejected or commented on either by Brad, so we'll have to wait and see what happens. :-/
Brad also announced to me today JSfree's new mailing list over at SourceHut! :-D So if you are interested in the JSfree directory, you may like to join us over at: https://lists.sr.ht/~bt/jsfree-devel
I think that this is great news! :-D
Kind regards,
James R. Haigh.
--
Wealth doesn't bring happiness, but poverty brings sadness.
Sent from Debian with Claws Mail, using email subaddressing as an alternative to error-prone heuristical spam filtering.
POTS/SMS: tel:+447402916784 (UK 07402916784)
VoIP: sip:+447402916784@voiceless.AA.net.uk
Instant messaging (XMPP/Jabber): (currently broken)
Postal: James R. Haigh, Middle Farm, Vennington, nr. Westbury, nr. Shrewsbury, Salop, SY5 9RG, Britain