In a previous life we ran a pair of HA load ballancers, serial connected heartbeat and with STONITH, in a Primary Backup config, as a front end to a whole bunch of ISP type services.

Then had these point to the services (multiples thereof). The published IP/s on the load ballancers were virtual or floating and moved between the two.

It gave us a lot of flexibility to adjust the load balancing parameters sometimes pressing the same in to service for fail-over of a back-end service, or dropping back end services out of the list for maintenance or reloads. When you do this though it kills any TCP sessions and it is up to the client to re-establish these but the load ballancers just point them at a different service when the re-connect happened. The state was lost though and each re-connect was a new application session. For web servers or squid proxys this does not matter much though

Basically if the backend services were load balanced of that front end then when one failed at least some service was maintained via what remained and we could drop the failed one out of the load balancing list once we spotted it (Nagios monitoring). There is no reason why this could not be scripted. Monit as such was not available to us at that point in time.

This might be a useful as a paid for service, to offer as an ISP offered service, but is overkill for a single request from a customer.

Alternatively if virtual/floating IP's were available there is no reason that you could not run a similar setup on your pair of VPS with STONITH  running on the same pair that ran the services, a direct heartbeat interconnect on a Private-LAN/VLAN or some such if a dedicated heartbeat serial link was not available and run them as a primary/backup pair.

But as suggested earlier your provider would need to be offering those extra things (heartbeat link, and virtual/floating IP).

You would of course need to keep the contents of the servers sufficiently synced in readiness for fail-over taking place, and it all would have to reside at the same provider.

If you tried anything like this across different providers (unlikely to be possible with floating/virtual IP's) there is a very real risk that network congestion could mess with your heartbeat link with unforeseen results and loss of service.

Heartbeat and STONITH are part (or were last I looked) of the linked Linux HA bunch. Last I looked at these though it was a long while ago. YMMV


Cheers


Kirbs



On 21/02/2019 14:08, Andy Smith wrote:
Hi Chris,

On Thu, Feb 21, 2019 at 01:19:14PM +0000, Chris Smith via users wrote:
I’m exploring the idea of using two VPSs on different hosts to
implement some sort of failover mechanism.  Is anyone here doing
something similar, or have any recommendations?
I do it myself but I'm not aware of any customers doing it.

All solutions in this space are going to require paying for multiple
VPSes, and I guess that is the major turn-off for people.

As a customer you cannot yet by yourself programmatically float an IP
address between two different VPSes, but it's what I do with the auid of
a script. I have asked in the past if any customers wanted to explore
that, in which case I would be able to turn that into a service.
Probably for free given that you need to pay for an extra VPS and at
least one extra IP.

A lot of this depends on what the very vague and high level term
"failover" means to you.

As one example architecture, I have two VMs each of which runs haproxy.
The haproxy fronts various different TCP services such as some web
sites, spamd, entropy service etc. There are multiple backend VMs
running each service.

Clients talk to the haproxy IP. The haproxy health-checks backends and
decides where to proxy the client connection to.

There is also a keepalived on each haproxy host which in the event of
the live haproxy host becoming unavailable moves the floating IPs to the
other haproxy host. By this means it is possible for me to take some of
the backend VMs out of service without clients noticing (connected
clients will reconnect, however).

So, downsides here:

- Added complexity, although once you understand them haproxy and
  keepalived are pretty simple sturdy pieces of software

- Have to pay for an extra VM sitting around doing nothing until it's
  needed. How much is the continuity worth it to you though? I mean,
  minimal BitFolk VM, £6.49+VAT/mo., arguably in many contects I could
  charge more than that for writing this email… 😀

- It's all at BitFolk. You survive death of a BitFolk host, but a lot of
  disruptions affect entire colo provider / site.

Maybe that's excessive work / expenditure for the level of resilience
you desire though. An essential first step is deciding what it is you
want to achieve.

The main reason I put floating IPs in front of customer-visible services
is because if I don't then customers will see errors and problems when I
do maintenance work either on the services themselves or when I reboot a
whole BitFolk server. In truth I think that a half hour unavailability
of spamd, apt-cacher, entropy etc is bearable but I know I will get
complaints and queries about it and so they have floating IPs just so I
don't have to deal with that.

BitFolk's resolvers are a different matter. They can't be unavailable
for half an hour, or even minutes really. At the moment there's four of
them and they live behind a Pacemaker cluster that always ensures that
two of them are available, again by use of floating IPs. This is very
complex and in hindsight I wish I had not done this. keepalived probably
would have sufficed. This cluster needs replacing due to its constiuent
VMs being obsolete OS versions and its next incarnation most likely will
not be a full Pacemaker cluster.

Maybe you are only trying to beat the catastrophic failure of a piece of
BitFolk's hardware.

In such a case, we try to limit the downtime to a handful of hours. We
have spare hardware, we hope that we could just have someone insert the
storage into a spare's chassis and boot the server again. It becomes
trickier if we think of a case where both the SSDs are destroyed. In
that case your data is gone; we can boot the OS but after many hours all
you would get is a clean VPS account.

In terms of resilience then, "have backups" is a really good second
step, as you can put your stuff back on BitFolk or any of the other
virtual machine hosting providers.

Beyond that, maybe you are thinking that if there were some sort of
storage snapshot you could at least boot your single VM on a different
piece of hardware pretty quickly resulting in only minutes of downtime
without having to pay for any extra VMs or doing anything particularly
complicated.

There currently is no such facility for this at BitFolk though it
doesn't seem that hard to do. I can even probably implement migration
with only a suspend/resore while storage deltas are transferred
(typically <10 secs pause). The main issue about that from my point of
view is, I still need to reserve storage and memory for you on another
host. The only thing saved from my point of view is CPU, and we're not
short of CPU. So how can I make that service available without charging
almost the same as an extra VPS?

People using cloud providers often solve these problems by spinning up
new guests as and when needed. BitFolk is not a cloud provider and
pivoting into that space is probably not something that can happen any
time soon, if ever, so unfortunately exploration of solutions in that
direction will be limited.

Most VM-as-cheap-colo-style providers like BitFolk do not offer live
migration and migration-in-event-of-failure products, probably because
of it costing almost the same as two VMs to begin with. Many more do
offer IPs that you can float about between your VMs programmatically
and/or by API. The latter sounds a lot more appealing to me than the
former, but who knows, maybe it could be a way for BitFolk to
differentiate itself in the marketplace. Something that is sorely
needed.

As I say, knowing your requirements is going to be a bare minimum for
you to make progress here regardless of what provider you use, but alsdo
on a personal level I would be interested to hear what your goals and
requirements are.

Cheers,
Andy


_______________________________________________
users mailing list
users@lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/users
-- 
admins@sheffieldhackspace.org.uk
www.sheffieldhackspace.org.uk