BitFolk Users March 2021

users@mailman.bitfolk.com

9 participants
5 discussions

by Hugh Frostick

Hi, I am a (delighted!) relatively new BF user and run two dozen websites under Centos and Virtualmin, with no email as I keep email off my webserver. I am fed up with Cpanel in multiple ways and want to drop the server where I currently have all my email and mail forwarders. Is another VPS on Centos with Virtualmin a good route to manage my and my clients’ email? Or is there a better solution for a mail server? Cheers Hugh

3 years, 5 months

Major maintenance scheduled for 2021-04-20 (1 server) and 2021-04-27 (another 6 servers)

by Andy Smith

Hello all, A web version of this email with any updates that have come to light since posting is available here: https://tools.bitfolk.com/wiki/Maintenance/2021-04-Re-racking == TL;DR: We need to relocate some servers to a different rack within Telehouse. On Tuesday 20 April 2021 at some point in the 2 hour window starting at 20:00Z (21:00 BST) all customers on the following server will have their VMs either powered off or suspended to storage: * hen.bitfolk.com We expect to have it powered back on within 30 minutes. On Tuesday 27 April 2021 at some point in the 4 hour window starting at 22:00Z (23:00 BST) all customers on the following servers will have their VMs either powered off or suspended to storage: * clockwork.bitfolk.com * hobgoblin.bitfolk.com * jack.bitfolk.com * leffe.bitfolk.com * macallan.bitfolk.com * paradox.bitfolk.com We expect the work on each server to take less than 30 minutes. See "Frequently Asked Questions" at the bottom of this email for how to determine which server your VM is on. If you can't tolerate a ~30 minute outage at these times then please contact support as soon as possible to ask for your VM to be moved to a server that won't be part of this maintenance. == Maintenance Background Our colo provider needs to rebuild one of their racks that houses 7 of our servers. This is required because the infrastructure in the rack (PDUs, switches etc) is of a ten year old vintage and all needs replacing. To facilitate this, all customer hardware in that rack will need to be moved to a different rack or sit outside of the rack while it is rebuilt. We are going to have to move our 7 servers to a different rack. This is a significant piece of work which is going to affect several hundred of our customers, more than 70% of the customer base. Unfortunately it is unavoidable. == Networking upgrade We will also take the opportunity to install 10 gigabit NICs in the servers which are moved. The main benefit of this will be faster inter-server data transfer for when we want to move customer services about. The current 1GE NICs limit this to about 90MiB/sec. == Suspend & Restore If you opt in to suspend & restore then instead of shutting your VM down we will suspend it to storage and then when the server boots again it will be restored. That means that you should not experience a reboot, just a period of paused execution. You may find this less disruptive than a reboot, but it is not without risk. Read more here: https://tools.bitfolk.com/wiki/Suspend_and_restore == Avoiding the Maintenance If you cannot tolerate a ~30 minute outage during the maintenance windows listed above then please contact support to agree a time when we can move your VM to a server that won't be part of the maintenance. Doing so will typically take just a few seconds plus the time it takes your VM to shut down and boot again and nothing will change about your VM. If you have opted in to suspend & restore then we'll use this to do a "semi-live" migration. This will appear to be a minute or two of paused execution. Moving your VM is extra work for us which is why we're not doing it by default for all customers, but if you prefer that to experiencing the outage then we're happy to do it at a time convenient to you, as long as we have time to do it and available spare capacity to move you to. If you need this then please ask as soon as possible to avoid disappointment. It won't be possible to change the date/time of the planned work on an individual customer basis. This work involves 7 of our servers, will affect several hundred of our customers, and also has needed to be scheduled with our colo provider and some of their other customers. The only per-customer thing we may be able to do is move your service ahead of time at a time convenient to you. == Rolling Upgrades Confusion We're currently in a cycle of rolling software upgrades to our servers. Many of you have already received individual support tickets to schedule that. It involves us moving your VM from one of our servers to another and full details are given in the support ticket. This has nothing to do with the maintenance that's under discussion here and we realise that it's unfortunately very confusing to have both things happening at the same time. We did not know that moving our servers would be necessary when we started the rolling upgrades. We believe we can avoid moving any customer from a server that is not part of this maintenance onto one that will be part of this maintenance. We cannot avoid moving customers between servers that are both going to be affected by this maintenance. For example, at the time of writing, customer services are being moved off of jack.bitfolk.com and most of them will end up on hobgoblin.bitfolk.com. == Further Notifications Every customer is supposed to be subscribed to this announcement mailing list, but no doubt some aren't. The movement of customer services between our servers may also be confusing for people, so we will send a direct email notification to the main contact of affected customers a week before the work is due to take place. So, on Tuesday 13 April we'll send a direct email about this to customers that are hosted on hen.bitfolk.com, and then on Tuesday 20 April we'll send a similar email to customers on all the rest of the affected servers. == 20 April Will Be a Test Run We are only planning to move one server on 20 April. The reasons for this are: * We want to check our assumptions about how long this work will take, per server. * We're changing the hardware configuration of the server by adding 10GE NICs, and we want to make sure that configuration is stable. The timings for the maintenance on 27 April may need to be altered if the work on 20 April shows our guesses to be wildly wrong. == Frequently Asked Questions === How do I know if I will be affected? If your VM is hosted on one of the servers that will be moved then you are going to be affected. There's a few different ways that you can tell which server you are on: 1. It's listed on https://panel.bitfolk.com/ 2. It's in DNS when you resolve <youraccountname>.console.bitfolk.com 3. It's on your data transfer email summaries 4. You can see it on a `traceroute` or `mtr` to or from your VPS. === If you can "semi-live" migrate VMs, why don't you just do that? * This maintenance will involve some 70% of our customer base, so we don't actually have enough spare hardware to move customers to. * Moving the data takes significant time at 1GE network speeds. For these reasons we think that it will be easier for most customers to just accept a ~30 minute outage. Those who can't tolerate such a disruption will be able to have their VMs moved to servers that aren't going to be part of the maintenance. === Why are you needing to test out adding 10GE NICs to a live server? Isn't it already tested? The main reason for running through this process first on one server only (hen) is to check timings and procedure before doing it on another six servers all at once. The issue of installing 10GE NICs is a secondary concern and considered low risk. The hardware for all 7 of the servers that are going to be moved is obsolete now, so it's not possible to obtain identical spares now. The 10GE NICs have been tested in general, but not with this specific hardware, so it's just an extra cautionary measure. The 10GE NICs will not be in use immediately in order to avoid too much change at once, but this still does involve plugging in a PCIe card which on boot will load a kernel module so while the risk is considered low, it's not zero. === Further questions? If there's anything we haven't covered or you need clarified please do ask here or privately to support. -- https://bitfolk.com/ -- No-nonsense VPS hosting _______________________________________________ announce mailing list announce(a)lists.bitfolk.com https://lists.bitfolk.com/mailman/listinfo/announce

3 years, 11 months

Upcoming maintenance work / how to handle notifications

by Andy Smith

Hi, There's some major and unavoidable upcoming maintenance work being undertaken by our colo provider. The effect on BitFolk is that 7 of our servers need to be physically removed from one rack and moved to a different rack in a different room in Telehouse. This is part of the colo provider's need to rebuild the entire rack, so it's not in our control, is unavoidable and we have limited say over the schedule as it also affects all their other customers in that rack as well. It's a big piece of work, though not complicated (for us). I'm writing to you because I'm not sure of the best way to notify customers about this work and would like you to give me some feedback on how you'd prefer the communication to work. We haven't yet agreed firm dates/times but it's going to be no sooner than a month from now and probably no later than six weeks from now. We're going to be upgrading the servers to 10GE networking so we're going to move one of them first, wait a week to be sure the hardware is stable in that configuration and then do the remaining six the following week. The first one may happen evening UK time, like 9pm or something, but the rest will likely happen a bit later, around midnight into early hours. So if affected, assume you're in that latter group, and expect half an hour or so of being powered off. An additional complication is that this comes while we are right in the middle of doing rolling upgrades of our fleet of servers. That work was started before I was made aware that the server move would be necessary, otherwise I might have postponed it. Some of you will have already been through that rolling upgrade process or be going through it now. As that's completely within our control we've been moving customers between servers one by one, at times convenient to you and individual to you, and then upgrading the server once it's empty. The consequence of that is, we don't know which customers will be on which servers come the date of the move. We could send out personalised notices as soon as we know the date, but some of those people won't be on the affected servers when the time comes, and some people who didn't get the notice WILL be affected when the time comes. As part of the rolling upgrades I'm trying to avoid moving anyone from a server that's not going to be moved onto one that will be moved. It is however unavoidable that some people will be moved between two servers that are going to be moved, so some will get one very short outage for the move of their VM and then the later longer outage when the whole server is relocated. So is there any value in sending out personalised advanced notice of maintenance more than a month ahead? Would it be better just to send notice to the announce@ address giving the list of affected servers and the time it's going to happen and then a refresher a week ahead and again a day ahead or something? If the prospect of being powered off for half an hour or so a month from now is not acceptable to anyone, then we can most likely move their VM to another server ahead of time - one that we know won't be involved in the move. By semi-live migration if necessary. That's the only per-customer thing we can do and it's extra work so we will only do it if people ask for it. There isn't enough spare capacity to do it for everyone so doing migration as default for everyone is not an option. But happy to do it on a case by case basis. Your thoughts? I just want to reiterate that these servers moves—part of larger work involving our colo provider and their other customers—cannot be negotiated individually with the several hundred BitFolk customers that will be affected. I'm asking you only about broad arrangements for notice that can be applied to the whole customer base. Once the date/time is decided the only per-customer action that can take place is whether you require us to do extra work to migrate your VM ahead of time, and you don't need to tell me that now as there will be an announcement when the date is known (soon, possibly even today). Cheers, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting

4 years

Upcoming maintenance work / how to handle notifications

by SJW

> > I'm writing to you because I'm not sure of the best way to notify > customers about this work and would like you to give me some > feedback on how you'd prefer the communication to work. > I am strictly amateur with nothing mission critical. I will not have any problems with whatever you decide to do. Steve >

4 years

Mariadb won't start (unable to connect to socket)

by andy.duffell＠gmail.com

Morning, Can anyone help me straighten out MariaDB? Background: I let the disk get completely full on my VPS (oops!). Bought another 5GB but the system was pretty unresponsive due to lack of disk space. Rebooted, thinking to dump some old kernels to free space, had to do this in Xen and it took an absolute age to shut down. Booted and resized disk successfully, but... *The issue: *MariaDB will not start.Presumably did not shut down happily. Returns: Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details. Looking at systemctl status mariadb.service: mariadb.service - MariaDB 10.1.47 database server Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Fri 2021-03-12 10:17:53 UTC; 10min ago Docs: man:mysqld(8) https://mariadb.com/kb/en/library/systemd/ Process: 11875 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE) Process: 11801 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? - Process: 11799 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS) Process: 11798 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS) Main PID: 11875 (code=exited, status=1/FAILURE) Status: "MariaDB server is down" Webmin also returns this fault code when prompted to start the DB: DBI connect failed : Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' Running Ubuntu 18.04. User skill level: probably just about enough knowledge to be dangerous. Cheers, Andy

4 years

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

BitFolk Users March 2021