Hi Chris,

On Thu, Feb 21, 2019 at 01:19:14PM +0000, Chris Smith via users wrote:

I’m exploring the idea of using two VPSs on different hosts to
implement some sort of failover mechanism.  Is anyone here doing
something similar, or have any recommendations?

I do it myself but I'm not aware of any customers doing it.

All solutions in this space are going to require paying for multiple
VPSes, and I guess that is the major turn-off for people.

As a customer you cannot yet by yourself programmatically float an IP
address between two different VPSes, but it's what I do with the auid of
a script. I have asked in the past if any customers wanted to explore
that, in which case I would be able to turn that into a service.
Probably for free given that you need to pay for an extra VPS and at
least one extra IP.

A lot of this depends on what the very vague and high level term
"failover" means to you.

As one example architecture, I have two VMs each of which runs haproxy.
The haproxy fronts various different TCP services such as some web
sites, spamd, entropy service etc. There are multiple backend VMs
running each service.

Clients talk to the haproxy IP. The haproxy health-checks backends and
decides where to proxy the client connection to.

There is also a keepalived on each haproxy host which in the event of
the live haproxy host becoming unavailable moves the floating IPs to the
other haproxy host. By this means it is possible for me to take some of
the backend VMs out of service without clients noticing (connected
clients will reconnect, however).

So, downsides here:

- Added complexity, although once you understand them haproxy and
  keepalived are pretty simple sturdy pieces of software

- Have to pay for an extra VM sitting around doing nothing until it's
  needed. How much is the continuity worth it to you though? I mean,
  minimal BitFolk VM, £6.49+VAT/mo., arguably in many contects I could
  charge more than that for writing this email… 😀

- It's all at BitFolk. You survive death of a BitFolk host, but a lot of
  disruptions affect entire colo provider / site.

Maybe that's excessive work / expenditure for the level of resilience
you desire though. An essential first step is deciding what it is you
want to achieve.

The main reason I put floating IPs in front of customer-visible services
is because if I don't then customers will see errors and problems when I
do maintenance work either on the services themselves or when I reboot a
whole BitFolk server. In truth I think that a half hour unavailability
of spamd, apt-cacher, entropy etc is bearable but I know I will get
complaints and queries about it and so they have floating IPs just so I
don't have to deal with that.

BitFolk's resolvers are a different matter. They can't be unavailable
for half an hour, or even minutes really. At the moment there's four of
them and they live behind a Pacemaker cluster that always ensures that
two of them are available, again by use of floating IPs. This is very
complex and in hindsight I wish I had not done this. keepalived probably
would have sufficed. This cluster needs replacing due to its constiuent
VMs being obsolete OS versions and its next incarnation most likely will
not be a full Pacemaker cluster.

Maybe you are only trying to beat the catastrophic failure of a piece of
BitFolk's hardware.

In such a case, we try to limit the downtime to a handful of hours. We
have spare hardware, we hope that we could just have someone insert the
storage into a spare's chassis and boot the server again. It becomes
trickier if we think of a case where both the SSDs are destroyed. In
that case your data is gone; we can boot the OS but after many hours all
you would get is a clean VPS account.

In terms of resilience then, "have backups" is a really good second
step, as you can put your stuff back on BitFolk or any of the other
virtual machine hosting providers.

Beyond that, maybe you are thinking that if there were some sort of
storage snapshot you could at least boot your single VM on a different
piece of hardware pretty quickly resulting in only minutes of downtime
without having to pay for any extra VMs or doing anything particularly
complicated.

There currently is no such facility for this at BitFolk though it
doesn't seem that hard to do. I can even probably implement migration
with only a suspend/resore while storage deltas are transferred
(typically <10 secs pause). The main issue about that from my point of
view is, I still need to reserve storage and memory for you on another
host. The only thing saved from my point of view is CPU, and we're not
short of CPU. So how can I make that service available without charging
almost the same as an extra VPS?

People using cloud providers often solve these problems by spinning up
new guests as and when needed. BitFolk is not a cloud provider and
pivoting into that space is probably not something that can happen any
time soon, if ever, so unfortunately exploration of solutions in that
direction will be limited.

Most VM-as-cheap-colo-style providers like BitFolk do not offer live
migration and migration-in-event-of-failure products, probably because
of it costing almost the same as two VMs to begin with. Many more do
offer IPs that you can float about between your VMs programmatically
and/or by API. The latter sounds a lot more appealing to me than the
former, but who knows, maybe it could be a way for BitFolk to
differentiate itself in the marketplace. Something that is sorely
needed.

As I say, knowing your requirements is going to be a bare minimum for
you to make progress here regardless of what provider you use, but alsdo
on a personal level I would be interested to hear what your goals and
requirements are.

Cheers,
Andy

_______________________________________________
users mailing list
users@lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/users

-- 
admins@sheffieldhackspace.org.uk
www.sheffieldhackspace.org.uk