Hi,
Here follows the outage report for the networking issue that occurred
on the morning of Friday 20th. If you have any questions please
feel free to ask either on or off-list; I'll answer if I can or pass
them on to James if I can't.
I understand that James has now completed his reorganisation of how
the terminal servers and masterswitches are accessed.
Cheers,
Andy
----- Forwarded message from "James A. T. Rice - Jump" -----
Hi Folks,
Apologies for the outage this morning, I'll attempt to describe what's
been found and what is being done to help prevent it happening again and
recover from something similar quicker in future.
It looks like the first signs of partial connectivity instability happened
at around 0500Z, and at around 0630Z suddenly became quite severe.
The sup-tfm1 router had disabed cef due to a malloc failure, but was still
mostly forwarding traffic and managable, the sup-tfm4 router however had
become very unstable, BGP sessions were flapping constantly, CPU was
saturated, and management of the device was impossible.
VRRP should normally take care of a dead device failover, however sup-tfm4
was still announcing itself as the preferred gateway, despite not being in
a situation to do so reliably.
The terminal servers, which speak BGP to the border routers, were mostly
accessible, as BGP to the mostly dead router was failing to fully
establish.
Reloading of sup-tfm4 was completed at about 0840Z, sup-tfm1 (generally,
the backup router) was upgraded to a newer IOS and reloaded at 1000Z.
To help prevent this happening again, the following has been done:
* Upgrade IOS on sup-tfm1 (removes a few memory leaks, fragmentation problems, etc)..
* Reduce the number of full BGP tables carried from 5 to 3 on each router.
There will be a plan to upgrade the IOS on sup-tfm4 in due course..
To help make resolution quicker in case similar situations occur, I'm
currently revamping the management devices (masterswitches, terminal
servers, etc) such that there is no dependancy on the routers for the
terminal servers to be able to access the masterswitches, as well as
sanity check all the configurations and improve them where possible.
Again, apologies for the inconvenience caused, I'm sorry we hadn't taken
the actions to streamline the management network earlier - it's been on
the todo list, please feel free to rant etc on jump-discuss, or to me, or
to support..
Thanks
James
----- End forwarded message -----
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi,
There was effectively a complete network outage at Bitfolk's
upstream provider, jump.net.uk, between around 06:34 and 08:39 GMT
this morning.
Other than that it was related to their routers, I do not as yet
have any further information. I will provide it when I do.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi,
So, the time has come with the Lenny release for me to consider
upgrading my VPS from etch. I remember a thread a few months ago where
it was mentioned that Lenny's kernel had problems acting as a Xen domU.
Is this still the case? Any other gotchas that I should be aware of?
Iain
On Sun, 15 Feb 2009 17:13:37 +0000
Iain Lane <iain(a)orangesquash.org.uk> wrote:
> Now the problem I'm having is with configuring the new kernel (this
> was the only dpkg issue with the upgrade, not bad). Here's the output:
>
> laney@cripps:~$ sudo dpkg --configure -a
> Setting up linux-image-2.6.26-1-xen-686 (2.6.26-13) ...
> update-initramfs: Generating /boot/initrd.img-2.6.26-1-xen-686
> Searching for GRUB installation directory ... found: /boot/grub
> warning: grub-probe can't find drive for /dev/sda1.
> grub-probe: error: Cannot find a GRUB drive for /dev/sda1. Check your
> device.map.
I got the same sort of error trying to update the kernel in Lenny.
I currently am running linux-image-2.6.18-6-xen-686 and tried to update
to linux-image-2.6.26-1-xen-686. In my case grub-probe could not
find /dev/xvda1
--
John Lewis
"Kingsclere Families" website uses the GeneWeb genealogy data server
--
John Lewis
Debian & the GeneWeb genealogical data server
Hi folks,
BitFolk has changed bank, so for those of you who pay by standing
order, please could you amend this to:
Bank: Barclays
Sort: 20-41-41
Account: 83665453
Name: BITFOLK LIMITED
These details are also present on all invoices generated as of now,
and are also listed at:
http://bitfolk.com/contact.html
I will also be contacting you individually any time I see a bank
transfer to the old (Abbey) account. After a month or two I will be
closing that account.
I am happy to see that Barclays supports Faster Payments:
http://www.business.barclays.co.uk/BRC1/jsp/brccontrol?task=popup1group&val…
Also it would help a great deal if while you are amending your
standing orders you could make sure that you have your VPS name as
the reference.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi folks,
If you're going to FOSDEM (http://fosdem.org/2009/) then I hope to
see you there, feel free to say hello. :)
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB