So, I asked apt what it could remove and it told me that 'grub-xen-bin'
was no longer needed.
This is not something I installed explicitly and I'm wondering if this
is something I need to retain or not. The last mention of this kind of
stuff I see on the list was from 2017....before I even created this VPS
image.
Ta,
n
Hi All,
I was hoping I can get a bit of help with my xen shell. I'm trying to get
into an otherwise defunct VPS I have. When I try to get into rescue mode I
get the following error:
libxl: error: libxl_domain.c:81:libxl__domain_rename: Domain 228:Domain
with name "rutabaga" already exists.
libxl: error: libxl_create.c:975:initiate_domain_create: Domain 228:cannot
make domain: -6
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read
failed: `/libxl/228/type': No such file or directory
libxl: warning: libxl_dom.c:54:libxl__domain_type: unable to get domain
type for domid=228, assuming HVM
libxl: error: libxl_domain.c:1038:libxl__destroy_domid: Domain
228:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 228:Unable
to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 228:Destruction
of domain failed
Does anyone have an advice for getting into rescue mode. I don't want to do
much, just copy the contents of my home directory to my local machine.
Don't care what happens to the contents of the VPS after that.
Thanks
Hi,
After receiving a number of alerts for VMs hosted on server "jack",
I investigated and found the server largely unresponsive.
Unfortunately I had no option but to forcibly reboot it, which I did
at about 06:47Z
It's now 07:01Z and monitoring says everything is back up, except
for two customer VMs which are waiting for a LUKS passphrase on
their console.
This problem was the same as what was experienced with some of the
other servers a few months ago. With the months-long gap I had hoped
it was some undiagnosed kernel issue which we had got past, but
apparently not, as "jack" is on the latest available kernel package.
I'm pursuing some ideas about a config change that may help, and I
managed to put that into place before "jack" was rebooted - it does
require a reboot so if it does help it won't be able to take effect
on the others until next reboot. On the other hand it doesn't hurt
either, so I've made the same change elsewhere also.
If that doesn't fix things then the next line of investigation will
be an upgrade of the hypervisor to latest stable release, though
that is a rather major undertaking.
Apologies for the disruption. It is challenging to debug a problem
that can take several months to occur, with no reliable way of
triggering it. :(
Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hi,
We don't have 100% firm date for this yet and I don't want to
announce it properly while the other moves are happening but I also
want to try to give a month of notice: The servers that are not
getting relocated today & 27 April will very likely be relocated on
27 May.
That is:
- elephant
- limoncello
- snaps
- talisker
Now because of the movement of individual customers that is still
taking place, it is very likely that everyone who's currently on
"snaps" will be moved off of it before then, in which case we will
relocate it at a time convenient to us as it won't affect anyone.
The other three have already had an OS upgrade so they definitely
will be relocated all together on the same night, which as I say is
very likely to be 27 May.
As before I will send another mail here when we have confirmation
and then we will send a direct email a week before to everyone who
will be affected. And as before it will be possible to have your
service moved about ahead of time at a time of your choosing.
While I'm here a reminder that:
- Server "hen" is being relocated tonight at some point between
21:00 and 23:00 BST (20:00 to 22:00 UTC). Those affected would
have received individual notifications about that a week ago.
- Servers "clockwork", "hobgoblin", "jack", "leffe", "macallan" and
"paradox" will be relocated on 27 April and later today all those
who will be affected will receive a direct email about this.
The full details of those works are at:
https://tools.bitfolk.com/wiki/Maintenance/2021-04-Re-racking
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Sorry if I'm a bit slow responding - work is hectic and the work/life
balance is a little skewed!
I'm using a TSIG key and DNSSEC mechanism to update the zone - I've been
using the zytrax books as reference:
https://www.zytrax.com/books/dns/ch7/xfer.html#allow-update
But as the names are dynamically updated, there is no change to the Serial
number
Chris
> 1. In the zone on your server is the serial number greater than the
> > copy on the Bitfolk DNS Servers (dig soa domain.com
> > @a.authns.bitfolk.co.uk) - repeat for each, including your own?
>
> This is what I thought of too. To phrase it a bit differently, when I
> first started doing similar to you (not dynamic DNS, but a 'hidden
> master' with my VPS/A&A as secondaries which actually get used by
> machines on the internets), I initially was forgetting to update the
> serial sometimes. Then the NOTIFY gets sent, but the transfer doesn't
> happen properly because the other side thinks there's nothing for it to
> fetch.
>
> Maybe that's what is happening here too.
>
> Cheers,
>
> --
> Iain Lane [ iain(a)orangesquash.org.uk ]
> Debian Developer [ laney(a)debian.org ]
> Ubuntu Developer [ laney(a)ubuntu.com ]
>
This is a bit out of scope for BitFolk support, but I feel like I'm
treading out of my depth a little here.
I've setup Bind9 and it's chuntering away quite happily, with BitFolk name
servers happily serving up DNS for my little corner of the internet. Now
I'm trying to setup a RFC 2136 client for Dynamic DNS, so that my home
router can connect and notify the outside world what my current IP is (via
an A or AAAA record), so that I can remote in from outside. I'd also like
to use wildcard SSL certificates from LetsEncrypt, but that too likes to
play with Dynamic DNS these days (TXT records).
I know that it's apparently working - I cna see the log entries for Bind9
reporting the new updates, and I see a .jnl file for my dynamic zone. I've
setup "notify yes;" for the dynamic zone, and made sure that I only send
AXFR (by default, IXFR is enabled, I've tried turning it off but I get the
same result), but I can't see if those updates are getting sent (should I
see that update going out in the logs?).
If I dump the cache, I can see that as far as the master is concerned,
everything is complete.
Can anyone help guide this poor soul?
Hello all,
A web version of this email with any updates that have come to light
since posting is available here:
https://tools.bitfolk.com/wiki/Maintenance/2021-04-Re-racking
== TL;DR:
We need to relocate some servers to a different rack within
Telehouse.
On Tuesday 20 April 2021 at some point in the 2 hour window starting
at 20:00Z (21:00 BST) all customers on the following server will
have their VMs either powered off or suspended to storage:
* hen.bitfolk.com
We expect to have it powered back on within 30 minutes.
On Tuesday 27 April 2021 at some point in the 4 hour window starting
at 22:00Z (23:00 BST) all customers on the following servers will
have their VMs either powered off or suspended to storage:
* clockwork.bitfolk.com
* hobgoblin.bitfolk.com
* jack.bitfolk.com
* leffe.bitfolk.com
* macallan.bitfolk.com
* paradox.bitfolk.com
We expect the work on each server to take less than 30 minutes.
See "Frequently Asked Questions" at the bottom of this email for how
to determine which server your VM is on.
If you can't tolerate a ~30 minute outage at these times then please
contact support as soon as possible to ask for your VM to be moved
to a server that won't be part of this maintenance.
== Maintenance Background
Our colo provider needs to rebuild one of their racks that houses 7
of our servers. This is required because the infrastructure in the
rack (PDUs, switches etc) is of a ten year old vintage and all needs
replacing. To facilitate this, all customer hardware in that rack
will need to be moved to a different rack or sit outside of the rack
while it is rebuilt. We are going to have to move our 7 servers to a
different rack.
This is a significant piece of work which is going to affect several
hundred of our customers, more than 70% of the customer base.
Unfortunately it is unavoidable.
== Networking upgrade
We will also take the opportunity to install 10 gigabit NICs in the
servers which are moved. The main benefit of this will be faster
inter-server data transfer for when we want to move customer
services about. The current 1GE NICs limit this to about 90MiB/sec.
== Suspend & Restore
If you opt in to suspend & restore then instead of shutting your VM
down we will suspend it to storage and then when the server boots
again it will be restored. That means that you should not experience
a reboot, just a period of paused execution. You may find this less
disruptive than a reboot, but it is not without risk. Read more
here:
https://tools.bitfolk.com/wiki/Suspend_and_restore
== Avoiding the Maintenance
If you cannot tolerate a ~30 minute outage during the maintenance
windows listed above then please contact support to agree a time
when we can move your VM to a server that won't be part of the
maintenance.
Doing so will typically take just a few seconds plus the time it
takes your VM to shut down and boot again and nothing will change
about your VM.
If you have opted in to suspend & restore then we'll use this to do
a "semi-live" migration. This will appear to be a minute or two of
paused execution.
Moving your VM is extra work for us which is why we're not doing it
by default for all customers, but if you prefer that to experiencing
the outage then we're happy to do it at a time convenient to you, as
long as we have time to do it and available spare capacity to move
you to. If you need this then please ask as soon as possible to
avoid disappointment.
It won't be possible to change the date/time of the planned work on
an individual customer basis. This work involves 7 of our servers,
will affect several hundred of our customers, and also has needed to
be scheduled with our colo provider and some of their other
customers. The only per-customer thing we may be able to do is move
your service ahead of time at a time convenient to you.
== Rolling Upgrades Confusion
We're currently in a cycle of rolling software upgrades to our
servers. Many of you have already received individual support
tickets to schedule that. It involves us moving your VM from one of
our servers to another and full details are given in the support
ticket.
This has nothing to do with the maintenance that's under discussion
here and we realise that it's unfortunately very confusing to have
both things happening at the same time. We did not know that moving
our servers would be necessary when we started the rolling upgrades.
We believe we can avoid moving any customer from a server that is
not part of this maintenance onto one that will be part of this
maintenance. We cannot avoid moving customers between servers that
are both going to be affected by this maintenance. For example, at
the time of writing, customer services are being moved off of
jack.bitfolk.com and most of them will end up on
hobgoblin.bitfolk.com.
== Further Notifications
Every customer is supposed to be subscribed to this announcement
mailing list, but no doubt some aren't. The movement of customer
services between our servers may also be confusing for people, so we
will send a direct email notification to the main contact of
affected customers a week before the work is due to take place.
So, on Tuesday 13 April we'll send a direct email about this to
customers that are hosted on hen.bitfolk.com, and then on Tuesday 20
April we'll send a similar email to customers on all the rest of the
affected servers.
== 20 April Will Be a Test Run
We are only planning to move one server on 20 April. The reasons for this are:
* We want to check our assumptions about how long this work will
take, per server.
* We're changing the hardware configuration of the server by adding
10GE NICs, and we want to make sure that configuration is stable.
The timings for the maintenance on 27 April may need to be altered
if the work on 20 April shows our guesses to be wildly wrong.
== Frequently Asked Questions
=== How do I know if I will be affected?
If your VM is hosted on one of the servers that will be moved then
you are going to be affected. There's a few different ways that you
can tell which server you are on:
1. It's listed on https://panel.bitfolk.com/
2. It's in DNS when you resolve <youraccountname>.console.bitfolk.com
3. It's on your data transfer email summaries
4. You can see it on a `traceroute` or `mtr` to or from your VPS.
=== If you can "semi-live" migrate VMs, why don't you just do that?
* This maintenance will involve some 70% of our customer base, so we
don't actually have enough spare hardware to move customers to.
* Moving the data takes significant time at 1GE network speeds.
For these reasons we think that it will be easier for most customers
to just accept a ~30 minute outage. Those who can't tolerate such a
disruption will be able to have their VMs moved to servers that
aren't going to be part of the maintenance.
=== Why are you needing to test out adding 10GE NICs to a live server? Isn't it already tested?
The main reason for running through this process first on one server
only (hen) is to check timings and procedure before doing it on
another six servers all at once. The issue of installing 10GE NICs
is a secondary concern and considered low risk.
The hardware for all 7 of the servers that are going to be moved is
obsolete now, so it's not possible to obtain identical spares now.
The 10GE NICs have been tested in general, but not with this
specific hardware, so it's just an extra cautionary measure.
The 10GE NICs will not be in use immediately in order to avoid too
much change at once, but this still does involve plugging in a PCIe
card which on boot will load a kernel module so while the risk is
considered low, it's not zero.
=== Further questions?
If there's anything we haven't covered or you need clarified please
do ask here or privately to support.
--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hi,
This doesn't overly matter hence why I'm just sending it to users@
and not announce@, but at around 03:12 BST (02:12Z) I mistakenly
opened support tickets with all customers hosted on "snaps"
regarding moving their service to a different server.
Although that move *does* have to happen at some point, we can't do
it until after the maintenance on 27 April is completed because
"snaps" isn't actually involved in that maintenance. If we went
ahead with doing what the ticket says, we'd be moving customers away
from a server that won't be affected onto one that will, which is
something we said we wouldn't do.
It was just a confusion on my part, forgetting that "snaps" wasn't
involved.
The reason why it doesn't really matter is that everything mentioned
in the tickets I'm talking about is still correct, it's just that it
gives you the impression that if you do nothing then the work might
happen as soon as a week from now. In reality it won't happen until
some time after 27 April.
I only mention this so that you don't wonder why you got the support
ticket and then nothing happened for weeks.
When I realised my mistake I then opened tickets with everyone on
the *correct* server ("leffe"). That happened at about 05:42 BST
(04:42Z).
I'm sorry about this confusion. If anything is not clear please say
so.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
There's some major and unavoidable upcoming maintenance work being
undertaken by our colo provider. The effect on BitFolk is that 7 of
our servers need to be physically removed from one rack and moved to
a different rack in a different room in Telehouse.
This is part of the colo provider's need to rebuild the entire rack,
so it's not in our control, is unavoidable and we have limited say
over the schedule as it also affects all their other customers in
that rack as well. It's a big piece of work, though not complicated
(for us).
I'm writing to you because I'm not sure of the best way to notify
customers about this work and would like you to give me some
feedback on how you'd prefer the communication to work.
We haven't yet agreed firm dates/times but it's going to be no
sooner than a month from now and probably no later than six weeks
from now. We're going to be upgrading the servers to 10GE networking
so we're going to move one of them first, wait a week to be sure the
hardware is stable in that configuration and then do the remaining
six the following week. The first one may happen evening UK time,
like 9pm or something, but the rest will likely happen a bit later,
around midnight into early hours. So if affected, assume you're in
that latter group, and expect half an hour or so of being powered
off.
An additional complication is that this comes while we are right in
the middle of doing rolling upgrades of our fleet of servers. That
work was started before I was made aware that the server move would
be necessary, otherwise I might have postponed it.
Some of you will have already been through that rolling upgrade
process or be going through it now. As that's completely within our
control we've been moving customers between servers one by one, at
times convenient to you and individual to you, and then upgrading
the server once it's empty.
The consequence of that is, we don't know which customers will be on
which servers come the date of the move. We could send out
personalised notices as soon as we know the date, but some of those
people won't be on the affected servers when the time comes, and
some people who didn't get the notice WILL be affected when the time
comes.
As part of the rolling upgrades I'm trying to avoid moving anyone
from a server that's not going to be moved onto one that will be
moved. It is however unavoidable that some people will be moved
between two servers that are going to be moved, so some will get one
very short outage for the move of their VM and then the later longer
outage when the whole server is relocated.
So is there any value in sending out personalised advanced notice of
maintenance more than a month ahead? Would it be better just to send
notice to the announce@ address giving the list of affected servers
and the time it's going to happen and then a refresher a week ahead
and again a day ahead or something?
If the prospect of being powered off for half an hour or so a month
from now is not acceptable to anyone, then we can most likely move
their VM to another server ahead of time - one that we know won't be
involved in the move. By semi-live migration if necessary. That's
the only per-customer thing we can do and it's extra work so we will
only do it if people ask for it. There isn't enough spare capacity
to do it for everyone so doing migration as default for everyone is
not an option. But happy to do it on a case by case basis.
Your thoughts?
I just want to reiterate that these servers moves—part of larger
work involving our colo provider and their other customers—cannot be
negotiated individually with the several hundred BitFolk customers
that will be affected. I'm asking you only about broad arrangements
for notice that can be applied to the whole customer base. Once the
date/time is decided the only per-customer action that can take
place is whether you require us to do extra work to migrate your VM
ahead of time, and you don't need to tell me that now as there will
be an announcement when the date is known (soon, possibly even
today).
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting