Hi all,
I've come across a strange issue partway through renumbering, where a new
IP is only responding from some hosts on the Internet, and seems only able
to route to some IPs itself.
The current config (mid-way through the change) is:
ip -4 addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP qlen 1000
inet 212.13.194.116/23 brd 212.13.195.255 scope global eth0
inet 85.119.82.116/21 scope global eth0
ip -4 route show
212.13.194.0/23 dev eth0 proto kernel scope link src 212.13.194.116
85.119.80.0/21 dev eth0 proto kernel scope link src 85.119.82.116
default via 85.119.80.1 dev eth0 metric 100
My new IP 85.119.82.116 is pingable from some devices, but not others (the
old IP is responding from all).
And outgoing traffic is also affected, for example I can't connect to
Google's Public DNS server 8.8.8.8:
traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 85.119.80.17 0.180 ms 0.213 ms 0.193 ms
2 212.13.194.4 0.466 ms 0.593 ms 0.613 ms
3 194.153.169.233 2.631 ms 2.545 ms 2.526 ms
4 134.222.231.29 15.405 ms 15.388 ms 15.366 ms
5 * * *
6 * * *
7 * * *
There's lots of other IPs I can and can't connect to (for example
158.43.240.4 works fine), I've not spotted a pattern yet.
Incoming traceroute's look odd too:
traceroute -n 85.119.82.116
traceroute to 85.119.82.116 (85.119.82.116), 30 hops max, 60 byte packets
1 212.64.153.2 0.581 ms 0.595 ms 0.622 ms
2 92.52.77.60 136.363 ms 136.371 ms 136.409 ms
3 92.52.76.198 0.178 ms 0.196 ms 0.193 ms
4 77.67.75.181 0.271 ms 0.289 ms 0.264 ms
5 89.149.185.165 1.324 ms 89.149.183.178 1.317 ms 89.149.185.230 1.295
ms
6 195.66.224.138 2.021 ms 1.965 ms 2.032 ms
7 129.250.5.25 2.226 ms 2.227 ms 1.944 ms
8 194.153.169.242 2.068 ms 194.153.169.241 2.133 ms 194.153.169.242
2.097 ms
9 85.119.80.16 1.988 ms 1.996 ms 85.119.80.17 2.117 ms
If you look at step 9, that's 2 different hosts responding to the same
traceroute. Is it possible someone else has setup a host using my new IP at
some point, and messed up the routing, then switched their hosts IP?
I've obviously done something silly, but I must be going blind.
Any ideas?
Ewan
Hello,
If you're going to the Ubuntu release party in London tonight:
http://loco.ubuntu.com/events/ubuntu-uk/1283/detail/
Then do track me down and say hello. :)
And no you can't yet use an official installer to put Oneiric Ocelot
(11.10) on a BitFolk VPS, but I hope to have that available this
weekend. With the usual warnings about not rushing into upgrades.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
"I'd be happy to buy all variations of sex to ensure I got what I wanted."
-- Gary Coates (talking about cabling)
Hi,
entropy.lon.bitfolk.com [212.13.194.102] has been renumbered to
85.119.80.215. If you're making use of our free entropy service[1] then
you may be referring to this host by IP address in your
configuration[2].
Use of the entropy service is not configured by default, so if you
don't know what I'm talking about then you most likely don't have
anything to change.
The old IP address will continue to respond until Sunday 30th
October 2011, at which point we will be taking it out of service.
If you're currently referring to it by IP address then you should
change to the host name now.
A summary of all renumberings currently in progress can be found
here:
https://tools.bitfolk.com/wiki/Renumbering_out_of_212.13.194.0/23
If you have any questions, please reply to the list or to
support(a)bitfolk.com.
Cheers,
Andy
[1] https://tools.bitfolk.com/wiki/Entropy
[2] Usually /etc/default/ekeyd-egd-linux on Debian/Ubuntu
--
http://bitfolk.com/ -- No-nonsense VPS hosting
"I am the permanent milk monitor of all hobbies!" -- Simon Quinlank
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hello,
We're now very close to the point when we need to begin getting you
all to renumber your IP addresses. We are not there yet, but here is
some more info:
https://tools.bitfolk.com/wiki/Renumbering_out_of_212.13.194.0/23
Before finalising the procedure and sending out instructions I could
use some feedback from you on a couple of points. These involve
situations where BitFolk is contacting you by IP address.
- BitFolk Backups
Our backup servers are doing rsync-over-SSH by IP address.
If we change the IP addresses our side then your backups stop
working until you make your change.
If we don't change the IP addresses our side then your backups
will break when you make your change, and then you would need to
contact support to get the change done.
Which would be preferred?
- Nagios checks
Many of you have Nagios checks. These are done by IP address.
If we immediately make the change on our side then alerts will
start to fire for you, and will not right themselves until you
complete the IP address change on your side.
If we don't make the change on our side then as soon as you make
your change, alerts will start to fire and then you'd have to
contact support to ask for the monitoring to be fixed.
I have a strong preference for us making the change as soon as you
have the info to make the change your side, so that the monitoring
fixes itself as you do the work.
Any other ideas how to do it?
- Old rsync-style zone transfers
Some of you still have DNS secondary services driven by zone files
that we are rsyncing from you. This is happening by IP address.
This service was deprecated years ago, since everyone got enough
RAM to run a proper DNS server for this purpose.
I would like to finally retire this service. How much time would
people like to stop relying on it and switch to running a real DNS
server?
- DNS slaves
Those of you who have secondary DNS services will be running your
own DNS server on your VPS and we'll be doing AXFR to that IP
address.
The DNS secondary service comes with Nagios monitoring, so as soon
as we switch the IP configured on our side then monitoring as
mentioned above will begin to alert.
There can be multiple masters, so what we could do is set both old
and new IP addresses so that zone transfers can continue to take
place, but our monitoring can only have one IP address.
With that in mind I think I would prefer to set both old and new
IPs as masters, let monitoring alert for your TCP/53 and that will
fix itself as your sort out your new IP address.
Any other ideas how to do it?
As far as possible in all cases above I would like to avoid the
situation where we have to chase individual customers one-on-one
about things that suddenly stopped working.
I can imagine this might not be avoidable—particularly in the case
of the backups—so if we have to do it, we have to do it.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hi,
spamd.lon.bitfolk.com [212.13.194.5] has been renumbered to
85.119.80.248. If you're making use of our free spamd service then
you may be referring to this host by IP address in your mail server
(or similar) config.
The old IP address will continue to respond until Tuesday 18th
October 2011, at which point we will be taking it out of service.
You should start using the new IP address now.
If you have any questions, please reply to the list or to
support(a)bitfolk.com.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
> The optimum programming team size is 1.
Has Jurassic Park taught us nothing?
-- pfilandr
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hi all,
T-mobile are being quite insistent that the problems I have sending
emails with 4 different providers to port 587 whilst on 3G are not at
their end.
I believe I already know the answer to this, but can someone from
bitfolk please confirm that you don't block traffic from t-mobile 3G ip
addresses on port 587...?
(Has anyone come across this issue before? Seems like they have some
weird filtering that drops the smtp connection after the STARTTLS - even
happens on other port numbers)
Thanks
Joseph
Hello,
If BitFolk does not provide authoritative DNS services for one or
more of your domains then you can ignore this email.
One of our authoritative nameservers, a.authns.bitfolk.com, has been
renumbered from 212.13.194.70 to 85.119.80.222.
Normally you would only refer to this nameserver by name, so most of
you will not need to make any modifications to DNS zones or
registrar settings. If you have for some reason created records in
your zones that point at 212.13.194.70 then it's now time to change
these.
Many of you are probably restricting zone transfer by IP address.
Zone transfers will continue to come from 212.13.194.70 until Monday
17th October 2011, at which point we will switch to sourcing them
from 85.119.80.222. Please add 85.119.80.222 to your ACLs now,
without removing 212.13.194.70.
If you have any questions about this, please do reply to the list or
to support(a)bitfolk.com.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hi folks,
I'm afraid that we've encountered a problem on obstler.bitfolk.com
where VPSes can't be started again once they've been shut down, and
we're going to have to reboot the host to fix this. We need to do it
as soon as possible as otherwise anyone who shuts their VPS down
will not be able to start it again.
I'm going to start an orderly VPS shutdown in a moment, which will
be followed by a host reboot and VPS startup again. This can take a
while because shutdown/startup happens serially. I'll follow up
again with more details.
10:52:08 up 230 days, 13:05, 4 users, load average: 1.17, 1.08, 0.96
Thanks, and apologies for the disruption.
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce
Hello,
This email is a bit of a ramble about block device IO and SSDs and
contains no information immediately relevant to your service, so
feel free to skip it.
In considering what the next iteration of BitFolk infrastructure
will be like, I wonder about the best ways to use SSDs.
As you may be aware, IO load is the biggest deal in virtual hosting.
It's the limit everyone hits first. It's probably what will dismay
you first on Amazon EC2. Read
http://wiki.postgresql.org/images/7/7f/Adam-lowry-postgresopen2011.pdf
or at least pages 8, 29 and 30 of it.
Usually it is IO load that tells us when it's time to stop putting
customers on a server, even if it has a bunch of RAM and disk
space left. If disk latency gets too high everything will suck,
people will complain and cancel their accounts. When the disk
latency approaches 10ms we know it's time to stop adding VMs.
Over the years we've experimented with various solutions. We built
a server with 10kRPM SAS drives, and that works nicely, but the
storage then costs so much that it's just not economical.
After that we started building bigger servers with 8 disks instead of
4, and that's where we are now. This worked out, as we can usually
get around twice as many VMs on one server, and it saves having to
pay for an extra chassis, motherboard, PSUs and RAID controller.
SSD prices have now dropped enough that it's probably worth looking
at how they can be used here. I can think of several ways to go:
- Give you the option of purchasing SSD-backed capacity
=====================================================
Say SSD capacity costs 10 times what SATA capacity does. You get
to choose between 5G of SATA-backed storage or 0.5G of SSD-backed
storage for any additional storage you might like to purchase, the
same price for either.
Advantages:
- The space is yours alone; you get to put what you like on it. If
you've determined where your storage hot spots are, you can put
them on SSD and know they're on SSD.
Disadvantages:
- In my experience most people do not appreciate choice, they just
want it to work.
Most people aren't in a position to analyse their storage use
and find hot spots. They lack either the inclination or the
capability or both - the service is fine until it's not.
- It means buying two expensive SSDs that will spend most of their
time being unused.
Two required because they'll have to be in a RAID-1.
Most of the time unused because the capacity won't be sold
immediately.
Expensive because they will need to be large enough to cater to
as large a demand as I can imagine for each server.
Unfortunately I have a hard time guessing what that demand would
be like so I'll probably guess wrong.
- Find some means of using SSDs as a form of tiered storage
=========================================================
We could continue deploying the majority of your storage from SATA
disks while also employing SSDs to cache these slower disks in
some manner.
The idea is that frequently-accessed data is backed on SSD whereas
data that is accessed less often is left on the larger-capacity
SATA, and *this remains transparent to the end user*, i.e. the VM.
This is not a new idea; plenty of storage hardware already does
it, ZFS can do it and so can BTRFS.
Advantages:
- For whatever benefit there is, everyone gets to feel it. If done
right, any VM that needs more IOPs should get more IOPs.
- Expensive SSDs purchased can be used immediately, in full.
Disadvantages:
- Since we can't use ZFS or expensive storage hardware, any
short-term solution is likely to be rather hacky. Do we want to
be pioneers here? This is your data.
- Customers with VMs that don't have heavy IO requirements (most)
will be subsidising those who *do* have heavy IO requirements.
It's very unlikely we will put prices up, but SSDs are not free
so it has the effect of delaying the usual progression of
more-for-less that this type of service goes through.[1]
- Beyond what might be quite a blunt instrument, customers will
have no way to request faster storage and rely on it being
present. You have "some storage" and if that storage isn't
performing as fast as you would like, all we would be able to do
is try to see why it's not being cached on SSD.
- Both?
=====
Perhaps there is some way to do both? Maybe to start with using the
whole SSD as cache, but as requests to purchase SSD-backed storage
come in the cache could be reduced?
Advantages:
- Again everyone feels the benefit immediately and hardware isn't
wasted.
- If the customer needs to buy SSD-backed storage then they can.
Disadvantages:
- If the caching is good enough then no one would feel the need to
buy SSD anyway, so why add complexity?
Questionable:
- If people buy all of the SSD, does that reduce caching benefit
to zero and suddenly screw everyone else over?
Presumably SSD-backed storage could be priced such that if a lot
of people did buy it, it would be economical to go out and buy a
pair of larger ones and swap them over without downtime[2].
So, if anyone has any thoughts on this I'd be interested in hearing
them.
If you had an IO latency problem, would you know how to diagnose it
to determine that it was something you were doing as opposed to
"BitFolk's storage is overloaded but it's not me"?
If you could do that, would you be likely to spend more money on
SSD-backed storage?
If we came to you and said that your VPS service was IO-bound and
would run faster if you bought some SSD-backed storage, do you think
that you would?[3]
My gut feeling at the moment is that while I would love to be
feeding the geek inside everyone and offering eleventy-billion
choices, demand for SSD-backed storage at an additional cost will be
low.
I also think it's going to be very difficult for an admin of a
virtualised block device to tell the difference between:
"All my processes are really slow at talking to storage; it's
because of my process ID 12345 which is a heavy DB query"
and:
"All my processes are really slow at talking to storage; that's
definitely a problem with BitFolk's storage and not anything I
am doing."
By the way, I think we've done reasonably well at keeping IO latency
down, over the years:
barbar: http://tools.bitfolk.com/cacti/graphs/graph_1634_6.png
bellini: http://tools.bitfolk.com/cacti/graphs/graph_2918_4.png
cosmo: http://tools.bitfolk.com/cacti/graphs/graph_2282_4.png
curacao: http://tools.bitfolk.com/cacti/graphs/graph_1114_6.png
dunkel: http://tools.bitfolk.com/cacti/graphs/graph_1485_6.png
faustino: http://tools.bitfolk.com/cacti/graphs/graph_1314_6.png
kahlua: http://tools.bitfolk.com/cacti/graphs/graph_1192_6.png
kwak: http://tools.bitfolk.com/cacti/graphs/graph_1113_6.png
obstler: http://tools.bitfolk.com/cacti/graphs/graph_1115_6.png
president: http://tools.bitfolk.com/cacti/graphs/graph_2639_4.png
urquell: http://tools.bitfolk.com/cacti/graphs/graph_2013_6.png
(Play at home quiz: which four of the above do you think have eight
disks instead of four? Which one has four 10kRPM SAS disks? Answers
at [4])
In general we've found that keeping the IO latency below 10ms keeps
people happy.
There have been short periods where we've failed to keep it below
10ms and I'm sure that many of you can remember times when you've
found your VPS sluggish. Conversely I suspect that not many
customers can think of times when their VPSes have been the *cause*
of high IO load, yet high IO load is in general only caused by
customer VMs! So for every time you have experienced this, someone
else was causing it![5]
I think that, being in the business of providing virtual
infrastructure at commodity prices, we can't really expect too many
people to want or be able to take the time to profile their storage
use and make a call on what needs to be backed by SATA or SSD.
I think we first need to try to make it as good as possible for
everyone, always. There may be a time in the future where it's
commonplace for customers to evaluate storage in terms of IO
operations per second instead of gigabytes, but I don't think we are
there yet.
As for the "low-end customers subsidise higher-end customers"
argument, that's just how shared infrastructure works and is already
the case in many existing metrics, so what's one more? While we
continue to not have a good way to ration out IO capacity it is
difficult to add it as a line item.
So, at the moment I'm more drawn to the "both" option but with the
main focus being on caching with a view to making it better for
everyone, and hopefully overall reducing our costs. If we can sell
some dedicated SSD storage to those who have determined that they
need it then that would be a bonus.
Thoughts? Don't say, "buy a big SAN!" :-)
Cheers,
Andy
[1] You know, when we double the RAM or whatever but keep the price
to you the same.
[2] Hot swap trays plus Linux md = online array grow. In theory.
[3] "Nice virtual machine you have here. Would be a real shame if
the storage latency were to go through the roof, yanno? We got
some uh… extras… that can help you right out of that mess.
Pauly will drop by tomorrow with an invoice."
— Tony Soprano's Waste Management and Virtual Server Hosting,
Inc.
[4] echo "oryyvav, pbfzb, cerfvqrag naq hedhryy unir rvtug qvfxf.
oneone unf sbhe FNF qvfxf." | rot13
[5] Barring *very* occasional problems like a disk broken in such a
way that it doesn't die but delays every IO request, or a
battery on a RAID controller going defective, which disables the
write cache.
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/announce