BitFolk Users January 2012

users@mailman.bitfolk.com

21 participants
22 discussions

Reboot scheduled for host "bellini", 2012-02-20 2200Z; host "president", 2012-02-21 2200Z

by Andy Smith

Hi folks, Short version: Two of our servers appear to be subject to a now fixed kernel bug affecting IPv6, and require a reboot for kernel upgrade. Host bellini.bitfolk.com will be rebooted on Monday 20th February at 2200Z. Provided that does fix the problem, host president.bitfolk.com will be similarly rebooted the following day, Tuesday 21st February also at 2200Z. Longer version: While investigating some recent reports of poor IPv6 performance, it seems that both bellini.bitfolk.com and president.bitfolk.com are affected by a bug in the igb Intel gigabit Ethernet driver as described here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630730 The symptoms are very poor IPv6 performance in the region of a maximum of 100kilobit/sec. On doing a tcpdump you will see packets of length greater than 1500 bytes, followed by an ICMP6 "packet too big" message coming from our server, and then a retransmit. These hosts have been up for 138 days (bellini) and 223 days (president), and unfortunately neither we nor any customers noticed any problems until recently. On casual inspection IPv6 works. It's only noticeable when trying to do a larger data transfer. Since the current impact is so low, I am not going to rush to reboot these hosts. I would rather give plenty of notice to those who need it. When it's time for the reboot we will shut down all VPSes on these servers cleanly, reboot the machine and then begin booting them up again. Downtime is expected to be in the region of 15 minutes. If you are in any doubt as to whether your VPS starts cleanly with all required services running, you should test this ahead of time. I am fairly confident that the problems observed are caused by that bug and therefore that a kernel upgrade will fix it, but unfortunately we do not have any other hardware that uses the igb driver. If doing the upgrade on bellini does not resolve the issue then we will have to consider our options. In the mean time, if your VPS is hosted on bellini or president then you may wish to set your VPS to prefer IPv4 DNS results ahead of IPv6 results: https://tools.bitfolk.com/wiki/IPv6#Preferring_IPv4_over_IPv6 Cheers, Andy -- http://bitfolk.com/ -- No-nonsense VPS hosting _______________________________________________ announce mailing list announce(a)lists.bitfolk.com https://lists.bitfolk.com/mailman/listinfo/announce

12 years, 8 months

IMPORTANT: You need to renumber the IP address(es) of your BitFolk VPS

by Andy Smith

Dear customer, [ Apologies if you've received duplicate copies of this email; we felt it was of sufficient importance to send direct to the contacts for each account. ] If you've been following our mailing lists you'll know that the time has come where we need you to change the IP address(es) associated with your VPS. Basically: - For each IP address on your VPS you need to enable a new address, which has already been routed to you. - You then need to reconfigure your services to use only the new IP addresses. - Finally you need to disable the old IP addresses. Full information on what you need to do can be found here: https://tools.bitfolk.com/wiki/Renumbering_for_customers The next time you need to reboot your VPS you should take care to instead shut it down and boot it again from your Xen Shell, otherwise you will lose the routes to the new IP addresses that have been added. The old IP addresses will be disabled approximately *three months* from now, so if you don't make these changes before then YOU WILL EXPERIENCE A LOSS OF SERVICE. Therefore if you have any questions please do not hesitate to ask, preferably on our users list: https://lists.bitfolk.com/mailman/listinfo/users Cheers, Andy -- http://bitfolk.com/ -- No-nonsense VPS hosting _______________________________________________ announce mailing list announce(a)lists.bitfolk.com https://lists.bitfolk.com/mailman/listinfo/announce

12 years, 8 months

VPS still sending packets from 212.13.195.254

by Michael Corliss

After making sure that my VPS is receiving packets on the right address, I'm now getting warnings that it's sending on the old address. To my knowledge I don't have any software installed for which I needed to specify the VPS' IP, so my guess is that this will end when I remove the old address from network/interfaces. Is that right? Is there a way to test before deleting the old IP? Thanks, Mike On Sat, Jan 28, 2012 at 7:14 AM, <andy(a)bitfolk.com> wrote: > Dear customer, > > You're receiving this email because our monitoring has detected that, in > the last 24 hours, your VPS "science" is still sending packets from the > address 212.13.195.254, which is part of our deprecated range of IP > addresses, 212.13.194.0/23. > > Back at the start of November 2011 we announced that a renumbering would > be necessary, and we set a deadline of Monday 6th February 2012 for this > to be completed: > > * Announcement: > > http://lists.bitfolk.com/lurker/message/20111104.221826.7f38e097.en.html > > * Full instructions: > > https://tools.bitfolk.com/wiki/Renumbering_for_customers > > There's now just over a week before the IP addresses that you are still > making use of are taken out of service. If you don't act to stop using > these addresses then YOU WILL EXPERIENCE A LOSS OF SERVICE once > 212.13.194.0/23 is decommissioned. > > Therefore if you have any questions about the renumbering process or > need help to complete it, it is important that you get in touch with us > as soon as possible. > > Questions regarding how to renumber would be best asked on the users > mailing list: > > http://lists.bitfolk.com/lurker/list/users.en.html > > BitFolk will only be able to carry out the work on your behalf as a > chargeable consultancy service. Please contact support if you would like > us to do this. > > It is possible that you have received this email purely because you > still have one or more of the deprecated IP addresses active on your > VPS. Internet hosts receive a constant "background noise" of random > traffic. It is recommended to remove the deprecated IP addresses once > you think you don't need them any longer, both to avoid this and to > reassure yourself that they are really not in use by other systems. > > Best regards, > Andy Smith > BitFolk Ltd >

12 years, 8 months

Additional monitoring for backups

by Andy Smith

Hi, If you don't currently take advantage of BitFolk's free backups service then the following doesn't apply to you. Two extra checks were added last night on the backups that BitFolk manages for you. All this info is going to go on a page I will create at https://tools.bitfolk.com/wiki/Backups but you need to know it now. - Backup age This checks that your most recent successful¹ backup is not too old. The warning threshold is 2.5x the appropriate interval. So if your backups happen every 4 hours, you'll receive an alert if they are ever older than 10 hours. If your backups happen daily then you'll receive an alert if they are ever more than 60 hours old. And so on. This alerting replaces the manual process of me being sent excerpts of log files that say a customer's backups are failing, then opening a support ticket with the customer to make them aware. If you receive this alert then your backups are definitely not happening. The alert looks like this: From: nagios(a)bitfolk.com Subject: ** PROBLEM alert - backup0.bitfolk.com/Backup age youraccount is CRITICAL ** ***** Nagios ***** Notification Type: PROBLEM Service: Backup age youraccount Host: backup0.bitfolk.com Address: 85.119.80.240 State: CRITICAL Date/Time: Tue Jan 31 12:37:38 UTC 2012 Additional Info: FILE_AGE CRITICAL: /data/backup/rsnapshot.6-7-4-6/hourly.0/85.119.82.121 is 16842912 seconds old and 4096 bytes (Those who haven't had a successful backup run in the last couple of days will have a huge number of seconds listed there because the backup system was only modified to record last successful contact recently) - Backup space usage This checks that your total backup space usage is not approaching your current quota. The thresholds are 95% for a warning and 99% for a critical. This alerting replaces the manual process of me being warned about customers who are exceeding their quota and then opening support tickets with them to discuss what they want to do about it². If you receive this alert then your backups are still happening, but you're in danger of (or already are) using more than the agreed space. If you exceed your quota then we may disable your backups, so that would eventually cause the above backup age alert to fire. Please note that we can't update the measurement of how much space you're using very often. Backup directories contain hundreds of millions of files, many of which are copies of each other, but it's not possible to tell without looking. Adding it all up takes quite a long time and stresses disk IO quite badly. So we only update quotas every day at best at the moment. Also please note that since this is a backup system, deleting files on your VPS does not immediately result in using less disk space for backups. It would be a rather pointless backup system if it threw away deleted files immediately. :) Anything that gets backed up is going to be kept for as long as your chosen backup schedule dictates, e.g. 6 months by default. If you have blown your quota by accidentally allowing large amounts of data to be backed up, you are still going to have to contact support to get it deleted. The alert looks like this: From: nagios(a)bitfolk.com Subject: ** PROBLEM alert - backup2.bitfolk.com/Backup space usage youraccount is WARNING ** ***** Nagios ***** Notification Type: PROBLEM Service: Backup space usage youraccount Host: backup2.bitfolk.com Address: 85.119.80.230 State: WARNING Date/Time: Tue Jan 31 15:52:28 UTC 2012 Additional Info: WARNING 98.50% (394/400MiB) used It is possible for the usage to go above 100% because we do allow you to go over your quota for short periods of time. Cheers, Andy ¹ "Successful" as in, rsync connected to your host, did some stuff and then exited with a non-error exit code. It does not necessarily mean that what you think should be backed up is being backed up. As with any backup solution you need to assure yourself on a regular basis that it's doing what you expect. ² Generally one or more of: - Buy some more disk space for backups - Backup fewer files - Backup less often -- http://bitfolk.com/ -- No-nonsense VPS hosting _______________________________________________ announce mailing list announce(a)lists.bitfolk.com https://lists.bitfolk.com/mailman/listinfo/announce

12 years, 8 months

IP renumbering

by Ian

Hi, Some of this was easy (changing where the slave DNSes got their info from was a simple sed search and replace) and some tedious (changing all the master DNS where because of needing to change the serial number for all of them, this was done by hand via a control panel - if there was an easier way, I am not sure I want to know now :) ) The one thing not mentioned on the wiki is that doing > grep -r 212.13.19 * in /etc now comes up with one hit, in ntp.conf: > # and admin.curacao.bitfolk.com (nagios) > restrict 212.13.194.71 so I changed that to 85.119.80.238 and 85.119.80.244 per the customer information page on the website and restarted ntp. Now I think it's just a case of waiting for the rest of the net to notice the DNS changes... Ian

12 years, 8 months

DNS refresh and expire values, alerting

by Andy Smith

Hello, Long email about DNS timers and alerting based on them. Unless you have domains on BitFolk's secondary DNS platform you probably won't care about this, and even then you still probably don't care unless you've been receiving alerts about them. Turn back now! Still here? OK. I've recently implemented DNS secondary domain zone age alerts. They send alerts when the zone on BitFolk's nameservers is too old. This saves me having to read logs and open a support ticket to advise customers that the zone transfers are failing, so I'm all in favour of that. The definition of "too old" differs on a per-domain basis. There are two values in the SOA record of a DNS domain; refresh and expire. The refresh value tells secondary servers how often to check in with the primary. The expire value tells secondary servers how long they should consider themselves valid for without successful contact with the primary. If there is no contact with the primary for the expire period then the secondary server stops serving the domain and returns SERVFAIL for every query. So, based on the above, a DNS domain should never be "older" than refresh. If it is older then that means that at least one refresh attempt failed. If the age approaches expire then the domain is in danger of not being served. At the moment I have decided to send a warning alert on 150% of refresh and a critical alert on 50% of expire. RIPE recommends 84600 (one day) for refresh and 3600000 (1000 hours; almost 6 weeks) for expire: http://www.ripe.net/ripe/docs/ripe-203 RFC1912 (1996) recommends one day for refresh and 2-4 weeks for expire: http://www.faqs.org/rfcs/rfc1912.html So let's say you go with RIPE's recommendations. You'd receive a warning alert after your secondary DNS setup was broken for 36 hours, and you'd receive a critical alert if it was still broken after 500 hours (almost 3 weeks). 500 hours after that, your domain stops being served on the secondary servers. That seems reasonable. Finally getting around to the point of this email: what do you think I should do about problematic SOA values that customers have chosen? For example, there are some domains currently on BitFolk's servers where the refresh and expire are both set to 300 seconds (5 minutes). Ignoring what happens with alerts for a moment, that means that every 5 minutes the secondary servers check the primary, and if that fails even once, the domain will return SERVFAIL for all queries until contact is made again. I can't understand what the use is of such a fragile setting; it looks erroneous to me. This isn't just DNS purism saying, "ooh, I don't like your non-standard values!" It will actually cause breakage very easily. But perhaps it is not for me to reason why. Those domains have been like that for a long time and I assume no one has noticed. It must have caused some problems any time the primary nameserver was unreachable by the secondary servers. But arguably that is not my problem. When combined with this new alerting though, what happens is that there isn't a refresh for 5 minutes then 2.5 minutes into that a critical alert fires since we're half way to expire (5 minutes). All being well there should be a recovery ~2.5 mins later. In reality these times will be variable because BitFolk's Nagios doesn't check DNS every few minutes, more like an hour plus. That is the most extreme example of this problem, but there are a few other domains in there where refresh and expire have been set to the same value. It will lead to a cycle of alert and then recovery, forever. So, what do you think I should do? I'm not willing to give up on the alerts because I think most people would like to know when their DNS setup is broken (or in danger of being broken), and it saves me having to personally interact to tell people this. Intentional DNS breakage is not my problem, but answering/opening support tickets is. Alerting can be disabled on a per-domain basis. Currently only by asking support, but eventually you'll be able to flip that on the Panel¹. So how about have Panel warn on the web page about what are considered unwise SOA values, and just allow the alerts to be disabled if for some reason this sort of fragile DNS setup is intentional? Cheers, Andy ¹ https://panel.bitfolk.com/dns/#toc-secondary-dns -- http://bitfolk.com/ -- No-nonsense VPS hosting

12 years, 8 months

Additional alerting for DNS secondary service

by Andy Smith

Hi, I've just enabled some additional alerting for the DNS secondary service. It will copy you in on alerts regarding BitFolk nameservers that are not correctly serving your domain. They look like this: From: nagios(a)bitfolk.com Subject: ** PROBLEM alert - a.authns.bitfolk.com/Auth. DNS example.com is UNKNOWN ** ***** Nagios ***** Notification Type: PROBLEM Service: Auth. DNS example.com Host: a.authns.bitfolk.com Address: 85.119.80.222 State: UNKNOWN Date/Time: Fri Jan 27 19:46:10 UTC 2012 Additional Info: DNS UNKNOWN - 1.444 seconds response time (No ANSWER SECTION found) This would indicate that a.authns.bitfolk.com is not serving the domain example.com when we would expect it to be. The reason for this change is that there are quite a few customer domains that aren't being served correctly and rather than keep opening tickets and chasing it up, I would rather let Nagios do what it is designed for. I am also working on a check that AXFRs are happening. Cheers, Andy -- http://bitfolk.com/ -- No-nonsense VPS hosting _______________________________________________ announce mailing list announce(a)lists.bitfolk.com https://lists.bitfolk.com/mailman/listinfo/announce

12 years, 8 months

RAM requirements for CentOS 6.x / Scientific Linux 6.x install

by Andy Smith

Hello, It appears that recently, CentOS 6.x and Scientific Linux 6.x installers started to require 512MiB RAM. Our smallest and most popular VPS plan currently has 480MiB RAM. That means that the average¹ BitFolk customer now cannot self-install derivatives of RHEL 6.x. This is extremely annoying since I suspect that these distributions work just the same in 480MiB RAM now as they did a few months ago. I can't find a simple way to override that check (please let me know if you know of one), and I'm not quite ready to increase the default RAM allocation to 512MiB. In the short term I am tempted to make the installer boot with 512MiB RAM if you have 480MiB. It will then revert to 480MiB upon normal use. Any comments? Cheers, Andy ¹ Mode and median. The mean is 641MiB. -- http://bitfolk.com/ -- No-nonsense VPS hosting Q. How many mathematicians does it take to change a light bulb? A. Only one - who gives it to six Californians, thereby reducing the problem to an earlier joke.

12 years, 8 months

Finding out old IP users

by Taavi Ilves

Hi I'm pretty sure that my host has been fully configured for new IP, but I got notification that someone has used old IP recently. Has anyone a simple solution for catching where from and which protocol are those connections coming for old IP? Thanks, Taavi

12 years, 9 months

Lenny-to-Squeeze Upgrade Plan - 2nd opinions sought!

by Mathew Newton

Hi everyone, I have decided to venture down what I hope is a well trodden path by now; upgrading my VPS from Debian Lenny to Squeeze. I have scoured the list archives and tried to make the most of http://www.debian.org/releases/squeeze/i386/release-notes/ch-upgrading.en.h… however I'm not ashamed to admit that I'm no expert in this regard and very much still learning so would appreciate a critique of my plan of action: - Ask Support kindly to perform a temporary disk snapshot - Login via Xen console - Verify no pending actions required for currently installed packages: aptitude (Then hit 'g' once in 'visual mode') - Verify that all packages are in an upgradable state: dpkg --audit - Show currently installed kernel(s): dpkg -l | grep linux-image Mine currently shows: ii linux-image-2.6-xen-686 2.6.26+17+lenny1 Linux 2.6 image on i686, oldstyle Xen suppor ii linux-image-2.6.26-1-xen-686 2.6.26-13lenny2 Linux 2.6.26 image on i686, oldstyle Xen sup ii linux-image-2.6.26-2-xen-686 2.6.26-26lenny2 Linux 2.6.26 image on i686, oldstyle Xen sup - Confirm non-usage of grub2: dpkg -l | grep grub Mine currently shows: ii grub 0.97-47lenny2 GRand Unified Bootloader (Legacy version) ii grub-common 1.96+20080724-16 GRand Unified Bootloader, version 2 (common - Updates apt sources lists from lenny to squeeze: sed -i s/lenny/squeeze/g /etc/apt/sources.list - Manually edit /etc/apt/source.list to confirm success of the above step and comment out any other repositories (non-Debian, backports etc) ? - Upgrade the kernel: (*** Am I aiming for the right one here? ***) aptitude install linux-image-2.6-686-bigmem - Update grub configuration: update-grub - Remove clocksource=jiffies from kopt directive in /boot/grub/menu.lst and confirm correct kernel will be loaded (i.e. default # matches new kernel position) - Upgrade udev (to minimise the risk of running the old udev with the new kernel): apt-get install udev - Reboot - Record a transcript of the upgrade session: script -t 2>~/upgrade-squeeze.time -a ~/upgrade-squeeze.script (This can be reviewed at a later date with scriptreplay ~/upgrade-squeeze.time ~/upgrade-squeeze.script) - Update the package list: apt-get update - Perform a minimal upgrade (i.e. upgrade those packages that don't require installation/removal of any other package(s)): apt-get upgrade - Complete the rest of the upgrade: apt-get dist-upgrade - Remove old/obsolete packages no longer required: apt-get autoremove - (Hopefully:) After the dust settles, advise Support that the snapshot of the old system can be removed Hope does that all look? Please don't hold back... Regards, Mathew

12 years, 9 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

BitFolk Users January 2012