Re: [bitfolk] What to do when a customer's backups go above 100%

30 Dec 2013

Hi Andy

Personally I'm send a warn at 80%, a critical at 90% and fail any backups
taking use beyond 100%.

If you could give users the ability to delete either an entire backup, or
all backups of a specific file/directory, then it is their problem.

I'm not personally fond of the idea of billing people whose backups grow
beyond their limit, might require a check of the t&c's they agree to, but
whatever it certainly shouldn't load you with work when a customer uses
more than they've paid for!

Nige

...
  Hi Rodrigo,

 On Mon, Dec 30, 2013 at 12:44:59PM -0200, Rodrigo Campos wrote:
  On Monday, December 30, 2013, Andy Smith wrote:
  - Nagios sends warnings when that usage goes
above 95%, sends
   critical alerts if it goes above 100% 

 Critical on 100% is maybe too late? 
 Having just checked it is actually 95% for warning and 99% for
 critical.

  I would say maybe ~90% can be critical, as you
are clearly running out
 of
 space and won't be able to backup anymore. 
 That is a fair point although the words "warning" and "critical" are
 at the moment just words used in template text in the alert and
 there is therefore no significance between them except what the
 recipient reads.

 Also doubtless different people will consider different percentages
 to be what they want.

 There isn't a concept of "running out of space" at the moment - if
 you go above 100% then your backups still work. You just eventually
 get asked by me to pay for more space or have some stuff deleted.

 Perhaps it is best if the critical alerts stay at 99% and I allow
 the warning percentage to be configurable.

  Or if the size of the last backup times 7 (like
in a week you won't be
 able
 to backup anymore) is more than x%, then critical. But maybe is a pita
 to
 get the size of the last backup? 
 It doesn't really work like this as there isn't a concept of "size
 of last backup" - only files that change are backed up, so if you
 had ten 1GB files that did not change since last time then the usage
 would be 10GB even though there are two sets of backups.

 If one file changed then both versions would be stored, so the usage
 across both sets of backups would be 11GB. So, there is a
 *differential* of 1GB per backup run, and it is true that I could
 take note of this and compare it to how much space is left then
 guess how many of these backup runs would fit given the same amount
 of diffs every time.

 That is really complicated though and I'm not convinced there'd be
 very much value in this compared to just the used percentage.

  If you have the size of the last backup, is it
possible to add a check
 to
 see if the current backup is X% more than the last one?

 This seems to me, that I'm totally inexperienced and never dealt with
 this,
 that can detect early when something got backed up when it shouldn't? 
 While possible, these just sound like more alerts that people are not
 going to be very interested in. For those who do use the backups
 service, do you feel that a simple percent used alert isn't good
 enough and you need to know about rates of change?

  But in any case, the most reasonable thing to do
for me is to abort the
 next backups until there is free space. 
 I'm not sure that is reasonable, and I will explain why below..

   Note that
although "just suspend the customer's backups as soon as
 they go past 100%" initially sounds like a good idea, it may not be
 as it prevents the customer from removing whatever it was they
 backed up that they didn't mean to, i.e. fixing it themselves. 
 Sorry, don't follow you here :-S 
 The backups are incremental. They aren't just X amount of files
 times Y backup points. It's X amount of files plus the amount of
 changes over a configurable time period that in the default case is
 6 months but some people have it set to 12 months or more.

 The default backup schedule looks like this:

 - Once every four hours, keep 6.
 - Once every day, keep 7.
 - Once every week, keep 4.
 - Once every month, keep 6.

 This means that (without you contacting support to ask for stuff to
 be deleted out of backups), once a file is backed up, it isn't going
 away for 6 months. Even if you delete it off your disk.

 e.g., you create:

 /var/tmp/dvd_rip

 of 8GB or whatever and it gets backed up, so it's now accessible
 via:

 /srv/backups/hourly.0/var/tmp/dvd_rip

 Noticing your backup space usage went up by 8GB you delete
 /var/tmp/dvd_rip or otherwise mask it from being backed up.

 The file doesn't disappear out of your backups though. At the next
 run it'll be accessible as:

 /srv/backups/hourly.1/var/tmp/dvd_rip

 and tomorrow it will be:

 /srv/backups/daily.0/var/tmp/dvd_rip

 and so on.

 By now you're probably wondering where I am going with this since it
 doesn't explain how a customer can take some action to reduce the
 space their backups use, in fact all I have done is explain how a
 customer CAN'T fix it.

 Well the thing is that at every backup run the oldest iteration is
 being deleted, so on the 6th daily run hourly.5 is being deleted and
 on the 6th monthly run monthly.5 is being deleted.

 Therefore if you identify things that have been backed up for a long
 time but which don't actually need to be, you can delete them from
 your disk or else mask them from being backed up, and as they age
 out they won't take up disk space any more.

 An example might be the files in /var/log/ which change all the
 time so at every hourly run you will back up a new set of them. If
 you decide that you don't want a backup of them every four hours
 then you might mask them from being backed up. This will have
 immediate effect with the next backup run since that is a set of
 logs that got aged out of hourly.5 and never appeared in hourly.0.

 I do take your point though, because there is nothing stopping
 anyone doing the above well before 100% is reached. What I just
 described is also a fairly rare case - normal cause of suddenly
 going past 100% is mistakenly letting some big transient file be
 backed up, and there's currently no way for the customer to fix that
 by themselves.

 At the moment there is no negative effect from going past 100%
 except that I will write to you and ask you to sort it out. So I
 could be wrong about suspending their backups being an unreasonable
 thing to do.

 I had suggested the option of "you will automatically order more
 disk and be charged for it" as one possible negative consequence,
 and it appeals to me because it's very simple!

 You suggest an alternative negative consequence of "once usage goes
 above 100%, suspend backups". That is also fairly simple, and has
 the advantage that no one gets a bill that they don't intend to pay.
 It has the downside that the customer's backups now will never get
 re-enabled unless they contact me to buy more disk or ask me to
 delete things.

 Which of these makes the most sense?

 Should both options exist for people to choose between?

 If I implemented a way (from the Panel) to nuke the most recent set
 of backups then would that make the "suspend" option the best one as
 the customer can still fix it themselves?

 That is, upon receiving the critical alert that they had now used
 more than 100% of backup space and their backups have been disabled,
 they determine that this is because something large got backed up
 that shouldn't have been backed up. They could then go to the Panel
 and delete the most recent backup run, and then backups start
 working again at the next run, all of this without needing to submit
 support tickets and without being charged any extra.

 Now I have typed it out, that does sound rather more friendly than
 sending people bills.

 Cheers,
 Andy

 --
 http://bitfolk.com/ -- No-nonsense VPS hosting
 _______________________________________________
 users mailing list
 users(a)lists.bitfolk.com
 https://lists.bitfolk.com/mailman/listinfo/users

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [bitfolk] What to do when a customer's backups go above 100%