Hi,
Some of you may recall the additional 120M RAM that was promised for
the end of January. If you don't know what I am talking about then
it's likely that you're one of the newer customers who already got
set up with the larger plans and none of this applies to you. If
you're not sure, see the price list on the web site which is current
and compare it with what you're paying.
The first attempt at this - just putting 48G of RAM into the
machines and rebooting them - hit problems when it turned out that
32bit Xen could not see more than 16G of RAM. That didn't provide
enough RAM for everyone to have the upgrade, and represented a waste
besides.
I then decided to use what was at the time a cold spare server to
experiment with a 64bit hypervisor on 32bit userland. If successful
then it would be something I could replicate on the other servers
with just some package installs and a reboot, as opposed to a
complete reinstall.
Well, it hasn't been as successful as I would like. Centos guests
seem to be incredibly unstable, and I think it's too risky to go
100% with that configuration.
In the meantime I have had to advertise the new plans on the web
site because the whole point of this was to keep Bitfolk
competitive. This became even more important recently when Bytemark
announced their plan upgrades.
My new plan is to install more hardware and build it 64bit to begin
with, and evaluate running 32bit guests on there. Why not 64bit
guests you may ask. The reason for this is that being able to move
VPSes around between hosts is essential for redundancy purposes and
therefore having only one host that can run 64bit guests rules that
out for now. This will be revisited in a few months.
If that proves to be stable then I will need to slowly empty the
current servers onto the new ones and then rebuild the old ones.
That's going to take months.
People are quite understandably keen to have the upgrade ASAP and I
do sympathise with this. It isn't right that new customers should
get more for the same price and that was never the intention. There
has been talk of cancelling services and then buying them new again,
which is more work for everyone involved but would force the issue
in your favour.
To try to avoid this sort of thing as much as possible I have had a
look and it seems like by not taking on new customers for a while
and shuffling a few guests around I can provide the promised upgrade
to all customers *on certain hosts*. With a bit more shuffling I
can *probably* allow same for most of the remaining hosts. I will
still need to eventually move everyone off of all the old servers
and rebuild them, because there's a total of 24G of RAM not
currently accessible.
So anyway, first up are customers on curacao. If you shut down
(*NOT* reboot!) and boot again, you should get an additional 120M of
RAM. If you aren't sure whether you're going to get extra or not
and want to check, look how much RAM you have now and then log on to
https://panel.bitfolk.com/ which will tell you what your plan is.
If your plan is bigger, shut down + boot and that's what you'll get.
In the next few days I will do the same on obstler.
Not everyone quite fits on kahlua or kwak if you all got upgraded.
I will probably give the upgrade to those paying yearly or quarterly
first and then attempt to move some things around to see if I can
give more. It's not far off working, so should be possible.
corona on the other hand is a bit of a lost cause. It was already
maxed out of RAM at 8G, so didn't get any extra, and there's nothing
that can be done. All I can do here for those who are desperate for
the extra RAM is try to move you to other machines.
I have stopped accepting new customers until the new servers are
tested because otherwise things are just going to get dragged out
more. There will be natural churn of customers so from time to time
there will be space for those on the waiting list.
In the meantime if anyone feels they are happy with their existing
amount of RAM and want to downgrade to the new plan that gives them
the same amount of RAM for lower cost, drop an email to
support(a)bitfolk.com and that will happen. People on quarterly or
yearly payments can too, and will get service credit back.
Any questions, feel free to ask on or off list..
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi folks,
I'm afraid that the host obstler went a bit strange whilst one of
you was shutting down your vps. dom0 reported the following:
Mar 20 11:04:38 obstler kernel: unregister_netdevice: waiting for v-jamesw to become free. Usage count = 1
over and over, and starting new VMs became impossible.
I had no choice but to shut down all VMs and then reboot the server,
start them again. You would have seen a clean shutdown and outage
until around 1117Z when VMs started to be brought up again. All
were up by 1126Z.
I am very sorry for the outage and will step up my investigations on
this -- it happened once before, 60 days ago.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi,
I thought I better point out - the reason that Bitfolk does not
offer Ubuntu Intrepid is that the Intrepid Xen kernel will not work
on anything except Ubuntu Intrepid host server, and Bitfolk does not
run Ubuntu.
I reported this as a bug ages ago and the outcome was that they
don't consider it a bug, have no intention of changing this (will
not be fixed in Jaunty or any later release either).
So if you are planning to upgrade to Intrepid please bear in mind
that you will need to use a non-Intrepid kernel, such as the one
from Hardy, or an upstream one or something like that. Basically I
do not recommend it.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
No problem here. We are getting what was advertised. It is not very
sensible to complain that others also get what they have contracted
for.
But you can never please all of the people all of the time.
Steve
Hi Andy,
I use the spamd server that you provide, so first of all thanks.
I have been seeing exim complain about a lot more timeouts recently and
I was wondering if the load had significantly increased on that server
or if perhaps you've added some tests that are taking longer.
Regards,
n
Hi,
Short story:
I would recommend that everyone using Debian Lenny, Ubuntu Hardy or
later follow the instructions in:
http://wiki.debian.org/Xen#head-2994c37779fecc0d4d17a00bbdbd7e018b598874
Longer story:
Ever since kahlua was rebooted last week, causing all the VPSes on
it to be suspended and restored, some customers found that their
clocks got stuck on March 4th. This sort of message was found repeated many
times in their logs:
Mar 4 18:51:07 host kernel: [4491822.326661] clocksource/0: Time went backwards: ret=1c336e30802c7 delta=-3996610936036461 shadow=1c336ba8c7f37 o
ffset=287bcb9e
It became impossible for them to set the time, cron wouldn't work,
and various other brokenness.
This seemed to only affect Debian Lenny and Ubuntu Hardy VPSes,
although not all of them -- I have several Lenny VPSes on that
machine for administrative purposes and they were all fine. I'm
guessing it depended on exactly what the VPS was doing at the time
of the suspend/restore.
It was not possible for those affected to do a shutdown, and so they
ended up having to do a "destroy" then "boot" from the Xen shell.
As a few people had reported this I had been searching for a fix
and did find:
http://wiki.debian.org/Xen#head-2994c37779fecc0d4d17a00bbdbd7e018b598874
but I was hoping to be able to replicate the problem on one of
Bitfolk's own internal or testing VPSes.
Today a helpful customer dropped by the IRC channel while he was
experiencing this problem on his Debian Lenny VPS, and the above
workaround fixed it for him. Later on this evening another customer
with the same issue was also helped by the above.
So, if you're running Debian Lenny or Ubuntu Hardy (or
newer versions of either of these) I would recommend that you follow
the instructions in the above link. I'll be using that
configuration by default for new Lenny/Hardy VPSes until this is
sorted out upstream.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi folks,
I'm about to have to reboot the host kahlua. It seems to have got
itself into a state where any VPS that is shut down or rebooted
leaves its devices behind and then they can't be started again.
I've tried to fix this without reboot but not getting anywhere.
Hopefully when I shut it down it will do a suspend to disk and then
restore when it comes back, but the state it's in now makes me worry
that this may not succeed, so do prepare for the possibility of a
full shutdown/boot.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi,
Here follows the outage report for the networking issue that occurred
on the morning of Friday 20th. If you have any questions please
feel free to ask either on or off-list; I'll answer if I can or pass
them on to James if I can't.
I understand that James has now completed his reorganisation of how
the terminal servers and masterswitches are accessed.
Cheers,
Andy
----- Forwarded message from "James A. T. Rice - Jump" -----
Hi Folks,
Apologies for the outage this morning, I'll attempt to describe what's
been found and what is being done to help prevent it happening again and
recover from something similar quicker in future.
It looks like the first signs of partial connectivity instability happened
at around 0500Z, and at around 0630Z suddenly became quite severe.
The sup-tfm1 router had disabed cef due to a malloc failure, but was still
mostly forwarding traffic and managable, the sup-tfm4 router however had
become very unstable, BGP sessions were flapping constantly, CPU was
saturated, and management of the device was impossible.
VRRP should normally take care of a dead device failover, however sup-tfm4
was still announcing itself as the preferred gateway, despite not being in
a situation to do so reliably.
The terminal servers, which speak BGP to the border routers, were mostly
accessible, as BGP to the mostly dead router was failing to fully
establish.
Reloading of sup-tfm4 was completed at about 0840Z, sup-tfm1 (generally,
the backup router) was upgraded to a newer IOS and reloaded at 1000Z.
To help prevent this happening again, the following has been done:
* Upgrade IOS on sup-tfm1 (removes a few memory leaks, fragmentation problems, etc)..
* Reduce the number of full BGP tables carried from 5 to 3 on each router.
There will be a plan to upgrade the IOS on sup-tfm4 in due course..
To help make resolution quicker in case similar situations occur, I'm
currently revamping the management devices (masterswitches, terminal
servers, etc) such that there is no dependancy on the routers for the
terminal servers to be able to access the masterswitches, as well as
sanity check all the configurations and improve them where possible.
Again, apologies for the inconvenience caused, I'm sorry we hadn't taken
the actions to streamline the management network earlier - it's been on
the todo list, please feel free to rant etc on jump-discuss, or to me, or
to support..
Thanks
James
----- End forwarded message -----
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi,
There was effectively a complete network outage at Bitfolk's
upstream provider, jump.net.uk, between around 06:34 and 08:39 GMT
this morning.
Other than that it was related to their routers, I do not as yet
have any further information. I will provide it when I do.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB
Hi,
So, the time has come with the Lenny release for me to consider
upgrading my VPS from etch. I remember a thread a few months ago where
it was mentioned that Lenny's kernel had problems acting as a Xen domU.
Is this still the case? Any other gotchas that I should be aware of?
Iain