Hi,
Unfortunately a batch of security advisories for Xen were posted
today, some of which affect the configuration in use at BitFolk:
http://xenbits.xen.org/xsa/
The details are under embargo until 22 November, so a reboot of all
hosts will need to take place before then. You should expect this
work to be carried out over a few days prior to the 22nd.
We will send out direct emails informing you of a two hour window in
which the work affecting your VPS is going to take place, but that
won't be sent for a week or so as testing of the patches will be
necessary first.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
Being able to store multiple contacts has been a long-requested
feature:
https://tools.bitfolk.com/redmine/issues/22
The first phase of this work has now been deployed to the panel web
site. That's just the ability to store and edit multiple contact
records.
If you visit:
https://panel.bitfolk.com/account/contacts/#toc-address-book
you'll be able to add/change/remove contact records.
These aren't very useful though until the other parts of BitFolk's
infrastructure can make use of them, and most of that work is still
to come.
The most common use of a different contact is for
monitoring/alerting, so that has been implemented first. You can
control who gets alerts by creating contact records and assigning
them to the "Alerting" role.
For those of you who have monitoring set up:
- If you have no contacts assigned to the "Alerting" role then
alerts will go to your main customer record.
- If you have at least one contact in the "Alerting" role then
alerts will go there instead. Each contact will get a copy.
- Everyone who already had a different email address set in our
monitoring has had a contact created and assigned to the
"Alerting" role for this purpose, so no action needs to be taken
to keep things behaving the same as they did before.
Other common requests for alternate contact details were for billing
and data transfer reports. These roles will be added as soon as the
relevant BitFolk systems are made to support them.
In the mean time I would suggest one useful thing to do is for
people who care about being contacted in an emergency to add at
least one contact to the "emergency" role, with a mobile phone
number and/or email address that is not hosted on your VPS. We would
only use those details to contact you if we really needed to, when
your main email address does not seem to be working.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
TL;DR version:
A few customers will see new bandwidth graphs appearing in their
Cacti, which can be found at https://tools.bitfolk.com/cacti/
The already-existing graphs will continue to run for a while and
then will cease to be updated. The new graph will become your
primary bandwidth graph.
This is because some high-bandwidth users need to have graphs based
on 64-bit counters, not 32-bit, in order to accurately measure
bandwidth use.
It may result in slightly higher values being reported in Cacti, but
this is merely correcting previous under-reporting. Monthly totals
as presented in our emailed data transfer reports were/are correct.
Longer version:
While investigating some recent discrepancies between the different
systems we have for accounting for customer data transfer, I
discovered that all of our bandwidth graphs were using 32-bit SNMP
counters.
A 32-bit unsigned counter has a maximum value of 4,294,967,295. With
5 minute sampling, that means that an interface seeing around
114Megabit/sec of traffic will reach 4,294,967,295 and wrap around
to zero again before the counter can be read.
As a result, a few customers who routinely use large amounts of
bandwidth have Cacti graphs that are under-reporting their usage.
Here is an example of a graph based on 32-bit counters:
http://tools.bitfolk.com/cacti/graph_5062.html
Here is the same interface graphed from 64-bit counters:
http://tools.bitfolk.com/cacti/graph_5617.html
You can see that the first daily graph has several drop-outs around
high bandwidth periods, and that the total data transferred in the
last 24 hours is under-reported in the first daily graphs.
I will not go through and replace every bandwidth graph with new
ones, only those of customers who are seen to be transferring more
than ~4.2GB in 5 minutes. So if you see a new graph appearing, this
is why.
Measuring 32-bit counters on a 2x 1 gigabit interface was of course
a very silly oversight, and it is now apparent that even the virtual
network device in an individual VPS has no problem exceeding
~114Mbit/s.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
By now you should have all received notification of the scheduled
maintenance that will be taking place in the early hours of the
morning (UK time) on 2016-09-02, 03 and 05.
This is the result of an embargoed security update that we have been
made aware of today.
If you have not seen the notification which was sent directly to the
email address we have for you at:
https://panel.bitfolk.com/account/contacts/
please first check your spam folders, etc., and failing that please
do let us know.
Also if you have any questions I am happy to answer these either on
the users mailing list or directly in a private ticket at
support(a)bitfolk.com.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
By now all customers should have received notification of scheduled
maintenance that will be required due to a serious security flaw in
the hypervisor software that we use (Xen).
If you have not seen an email regarding this then please check your
spam folders etc.
The full details¹ are in the email you've already received and I'm
only sending this so as to have a public notification I can link to
when people raise support tickets to ask what is going on. :)
Anyway, the hosts have all been patched and the maintenance consists
of merely rebooting them to boot into the new hypervisor. This will
happen across three nights.
In previous non-SSD days this used to take around 30 minutes to shut
down all VPSes, reboot and boot them all again. These days I expect
it to be much shorter, maybe 5 minutes. So, you should see a clean
shut down followed by a boot a few minutes later.
It is important that you ensure that your VPS boots cleanly with all
services you expect running to be running. We offer free Nagios
monitoring which can be useful for assuring yourself that everything
you expect to be running really is running. Also if I see Nagios
looks more broken afterwards than it was to start with then I will
have a quick investigate. If interested in having that set up then
please contact support(a)bitfolk.com.
Cheers,
Andy
¹ Well, not any details about the bug itself. These are under
embargo until mid day Tuesday 26 July.
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
For about the 5th time in the last 6 months, Spamhaus has listed the
IPv6 address of our support ticket mail host as a spam source.
I have checked every outbound port 25 connection from that host and
verified that the only thing it sends is replies to support tickets.
The previous times this happened I was able to de-list the host, but
this time:
https://www.spamhaus.org/query/ip/2600%253A3c03%253A%253A31%253A2000
just says "invalid input.", so I can't de-list it this time.
Last time this happened I attempted to contact Spamhaus both by
their web contact form and by twitter to ask for more info as to why
they keep listing this host. I have not received a response.
So, all I can conclude is that Spamhaus are wrong. Possibly someone
is automatically reporting ticket responses to them as spam. I can
only recommend not using their "zen" DNSBL for binary blocking
decisions.
If anyone has any contacts at Spamhaus that do actually respond then
I would appreciate you putting me in touch.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
Two-factor authentication for the BitFolk Panel and Xen Shell has
long been requested¹, and is finally now implemented. If you wish to
enable it please visit the "Security" section of the Panel to do so:
https://panel.bitfolk.com/account/security/
You should hopefully find the process straightforward. If you don't,
please let us know.
Thanks to the requesters and those who've been helping to test it
over the last ~week. For those who've been testing: I've disabled it
on your live account but left the key data there. If you don't want
to re-use the same key you can just invalidate it and generate
another one.
Cheers,
Andy
¹ https://tools.bitfolk.com/redmine/issues/117
PS It's worth looking over the list of outstanding feature requests
and seeing if there's any you'd like to vote for as it's nice to
know what is important to you.
https://tools.bitfolk.com/redmine/issues?query_id=1 (log in with
your usual BitFolk credentials if you want to vote / update
anything)
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
For just short of 5 hours on Friday 29 April we lost connectivity to
80.0.0.0/8 i.e. every IP address that starts with 80.
An email sent at the time:
On Fri, Apr 29, 2016 at 10:58:07AM +0000, Andy Smith wrote:
> So the list of affected prefixes so far is:
>
> 80.1.0.0/16
> 80.7.0.0/16
> 80.40.0.0/13
> 80.68.80.0/20
> 80.95.96.0/19
> 80.192.0.0/14
> 80.229.0.0/16
> 80.238.0.0/21
> 212.110.160.0/19
The last one now appears to have been listed in error and was
actually unaffected. The extent of the outage was of course all IP
destinations in 80/8 and not just the ones reported above.
This outage was the result of route filtering by our transit
provider Jump Networks. A more detailed explanation of how that
happened has been provided by Jump and is included below.
Jump's route filtering is controversial and has in the past caused
us (relatively minor) problems when a network removes a covering
route. It's something I've been putting pressure on Jump to improve
so I'm glad to see further development being put into it.
I am sometimes asked why we are single-homed to Jump and not do our
own BGP. I do not see it as being single-homed, I see it as being
part of Jump's network. If we were talking about being on the end of
a wire that was connected solely to Jump then that would be a
single-homed situation that I wouldn't find reasonable. However our
connections to Jump's core network are redundant at every level, and
from there Jump has multiple transit providers and a pair of LINX
peering sessions.
It does mean however that we have to own the problems that happen with
Jump's network when they affect us and not just try to pass it off as
being "a fault with a supplier". I apologise for this outage and any
disruption that it caused you.
On the whole I am happy with Jump's network and don't think I could
do a better job in replicating it. I suspect that if I tried then
what we'd end up with is something that overall has less
reliability. The route filtering matter is the only thing I have
taken issue with; I will continue to push for improvements in
this area to reduce the possibility of any further problems of this
nature.
I appreciate there's some fairly technical details in this and the
explanation below so if any of it is unclear then do feel free to
ask further questions.
Regards,
Andy Smith
Director
BitFolk Ltd
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Please consider the environment before reading this e-mail.
— John Levine
From: "James A. T. Rice"
Date: Fri, 29 Apr 2016 23:42:48 +0100
Subject: [jump-announce] Connectivity loss from Jump to 80.0.0.0/8 earlier today
Connectivity loss from Jump to 80.0.0.0/8
=========================================
What happened
-------------
On 2016-04-29 from 0611 UTC until 1053 UTC we suffered a loss of
connectivity to the IPv4 80.0.0.0/8 range. IPv6 was unaffected.
The set of circumstances leading to this was:
1) On 2016-04-27 at 1318 UTC, Belgacom, AS6774 (Belgium's largest
telecommunications company) started announcing 80.0.0.0/8 into the global
BGP routing tables. Belgacom have no authority to announce 80.0.0.0/8
2) Despite Belgacom having no authority to announce 80.0.0.0/8, our
upstreams, NTT, Level3, and GlobalCrossing, trusted these announcements
from Belgacom, accepting these routes into their routing tables.
Our upstreams normally have fairly strict route filters, however Belgacom
appears to have been one of their 'trusted' peers that these filters
weren't applied to.
3) Jump in turn has trusted NTT/Level3/GlobalCrossing's route filters to
give us an authoritative view of what's supposed to be in the global
routing tables.
In this case the trust was too much, and we accepted 80.0.0.0/8, and with
our subsequent filter rebuilds stopped accepting de-aggregates of that
prefix.
We do have sanity checking in place in our filter building to help prevent
illegitimate aggregates like this reducing the size of our accepted prefix
list too much, however there were only ~1500 prefixes announced inside
80.0.0.0/8 and this reduction wasn't beyond the sanity check limits we
have.
4) On 2016-04-29 at 0611 UTC, Belgacom stopped announcing the illegitimate
80.0.0.0/8 prefix. Since Jump was no longer accepting the de-aggregates
within that range, we lost connectivity to 80.0.0.0/8.
We'd hope to have been alerted right away by our nlnog ring node, which
monitors when other nlnog ring nodes become unreachable, however out of
389 nodes, only 6 were in 80.0.0.0/8, which wasn't sufficient to trigger
a problem report.
5) At 1053 UTC a manually triggered filter rebuild at Jump restored
connectivity to prefixes within 80.0.0.0/8.
NB Filter rebuilds normally happen three times a day, outside of working
hours, at 0030 UTC, 0730 UTC, and 1830 UTC.
Prevention
----------
1) Belgacom shouldn't misconfigure their BGP
Unfortunately we can't control this.
2) NTT/Level3/GlobalCrossing shouldn't trust Belgacom so much
Unfortunately we can't control this.
3) We shouldn't trust NTT/Level3/GlobalCrossing so much
It is suboptimal that illegitimate routing announcements and
misconfigurations at other ISPs can cause trouble for us - this is one of
the reasons we have the route filter system, as it almost eliminates the
risk of us accepting BGP prefix hijacks - hijacks typically involve
announcing a more specific prefix of a legitimate BGP route. However we're
keen to also prevent the problems we've currently 'traded' for the
problems the prefix filtering prevents.
We do have a developer working on the BGP route filtering system, and one
of the milestones already set is for 'genuine / necessary' routes to
persist in the allow list for 9 months after it was last necessary, which
will prevent situations like this. This will be in place in the next few
weeks.
I hope this explains the problems, my apologies for any inconvenience this
caused.
Sincerely
James Rice
Director
Jump Networks Ltd
Hi,
I've now added support for installing Ubuntu 16.04 to the Xen Shell.
Your instance of the Xen Shell will need to be version
1.48bitfolk36. If you have an old copy hanging around, log out of it
completely and SSH to it again.
I would not recommend attempting to use ZFS unless you have a lot
more than the default 1GiB memory (and then you'll still need a
separate ext4 /boot).
Here's an Asciinema of me installing a 64-bit encrypted storage
version of it in real time. I skipped the swap device for speed but
you may not necessarily want to do that.
https://asciinema.org/a/8n7onshwzaziqhz7ewqmxcbcv
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
Hi,
Between about 06:46 and 07:09 UTC today I was doing some work to
move the entropy service behind a VRRP IP and so users of it will
have seen some messages like:
ekey-egd-linux: Reconnecting to EGD server
in their logs. As long as they do not continue in past 07:09 there
is nothing to be concerned about.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting