Hi,
You may recall recently that there was a period of poor network
performance because the customer DNS resolver on 212.13.194.71 was
overloaded:
http://lists.bitfolk.com/lurker/message/20110102.221800.b90128dc.en.html
In that thread I promised to provision a new dedicated resolver to
avoid a re-occurrence of the issue.
Instead I took the opportunity to provision several new resolver
hosts in a cluster with fail over for the service IPs.
All customers should change their resolvers from:
212.13.194.71
212.13.194.96
to:
85.119.80.232
85.119.80.233
There's some maintenance coming up in February (details in a
separate email, shortly) which will take 212.13.194.71 offline for
several hours. It's therefore important that you change to using the
new resolvers before this time, otherwise you will experience severe
network performance problems.
If you have any questions please direct to users list or
support(a)bitfolk.com.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hi,
Short version:
faustino had a kernel panic that seems related to filesystem
corruption, was power cycled, fscked and checked, VPSes started again.
Long version:
At approximately 0250Z this morning I was alerted that an
infrastructure VPS on host faustino was not responding. On
investigation it had crashed with a kernel error on the host machine
itself. Attempting to restart the VPS caused more kernel errors and
eventually a lock up, so it was necessary to power cycle the host.
After the host booted I carried out a little bit more investigation
before starting VPSes again. It seems that the host encountered a
filesystem error in its /var filesystem which the xend process was
writing to, which in turn crashed one of the VPSes (our
infrastructure one). On boot, the /var device had undergone some fsck
repair.
I forced a fsck of all filesystems on the host (i.e. ours, not
yours) and they all came back clean. I then started the
infrastructure VPS which had crashed before, and this started up
without issue.
I then, at ~0306Z, issued the command to start up all customer
VPSes, and this is still taking place. In fact it just finished as
of ~0321Z. System load will be heavy for a while as every VPS will
be doing its own fsck.
I hope that the root cause of this issue was filesystem corruption,
and it is now behind us. It could be a few other things though. The
RAM was replaced on 26th February and could be at fault. It could
also be a problem with the RAID controller.
There's no real evidence of any of that yet, so I'm going to have to
just keep an eye on things. However if further problems present
themselves then we do have a spare server almost identical to
faustino which we will swap the disks into, or replace other parts
if a clear culprit is suggested.
Please accept my apologies for the disruption.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
you were both right - I deleted files and voila emails were working again, and mysteriously the ureadahead entry disappeared when I ran -df as well.
now done to a much healthier 18% of disk usage
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda 10321208 1697448 8099472 18% /
none 293304 120 293184 1% /dev
none 297888 0 297888 0% /dev/shm
none 297888 72 297816 1% /var/run
none 297888 0 297888 0% /var/lock
none 297888 0 297888 0% /lib/init/rw
thanks very much for your help
Andrew
On Fri, Mar 04, 2011 at 01:50:53PM +0000, Andy Bennett wrote:
> Your root partition is full and you don't appear to have a separate one
> for the spools.
>
> You may find that other things like mailboxes, logs and databases files
> have all been unexpectedly truncated. Free up some space, check
> everything carefully and be prepared to restore things from backups
> where daemons have gotten into a fix before writing out their in memory
> data.
And maybe also consider asking for a disk space nagios alert (will
require either allowing check by ssh, running nrpe or snmpd).
Cheers,
Andy
Hi all,
I have recently started receiving critical alerts from Nagios
regarding the ntp service on my 2 VPS. On both, ntpd is functioning
perfectly fine, and iptables has rules to allow traffic through on
port 123, but /etc/ntp.conf contains:
# and admin.curacao.bitfolk.com (nagios)
restrict 212.13.194.71
It looks like the recent scheduled maintenance work caused the IP of
admin.curacao.bitfolk.com to change, although I see that
nagios.bitfolk.com still points to the old IP. Assuming that my
diagnosis is correct, would it be possible to update the entry for
nagios.bitfolk.com so that our ntp.conf files can refer to
nagios.bitfolk.com rather than a hard-coded IP to prevent future
reoccurrences?
On a related note, is the nagios service is set up to check UDP or TCP
or both?
Thanks!
Adam