Hi Andy
Personally I'm send a warn at 80%, a critical at 90% and fail any backups
taking use beyond 100%.
If you could give users the ability to delete either an entire backup, or
all backups of a specific file/directory, then it is their problem.
I'm not personally fond of the idea of billing people whose backups grow
beyond their limit, might require a check of the t&c's they agree to, but
whatever it certainly shouldn't load you with work when a customer uses
more than they've paid for!
Nige
Hi Rodrigo,
On Mon, Dec 30, 2013 at 12:44:59PM -0200, Rodrigo Campos wrote:
On Monday, December 30, 2013, Andy Smith wrote:
- Nagios sends warnings when that usage goes
above 95%, sends
critical alerts if it goes above 100%
Critical on 100% is maybe too late?
Having just checked it is actually 95% for warning and 99% for
critical.
I would say maybe ~90% can be critical, as you
are clearly running out
of
space and won't be able to backup anymore.
That is a fair point although the words "warning" and "critical" are
at the moment just words used in template text in the alert and
there is therefore no significance between them except what the
recipient reads.
Also doubtless different people will consider different percentages
to be what they want.
There isn't a concept of "running out of space" at the moment - if
you go above 100% then your backups still work. You just eventually
get asked by me to pay for more space or have some stuff deleted.
Perhaps it is best if the critical alerts stay at 99% and I allow
the warning percentage to be configurable.
Or if the size of the last backup times 7 (like
in a week you won't be
able
to backup anymore) is more than x%, then critical. But maybe is a pita
to
get the size of the last backup?
It doesn't really work like this as there isn't a concept of "size
of last backup" - only files that change are backed up, so if you
had ten 1GB files that did not change since last time then the usage
would be 10GB even though there are two sets of backups.
If one file changed then both versions would be stored, so the usage
across both sets of backups would be 11GB. So, there is a
*differential* of 1GB per backup run, and it is true that I could
take note of this and compare it to how much space is left then
guess how many of these backup runs would fit given the same amount
of diffs every time.
That is really complicated though and I'm not convinced there'd be
very much value in this compared to just the used percentage.
If you have the size of the last backup, is it
possible to add a check
to
see if the current backup is X% more than the last one?
This seems to me, that I'm totally inexperienced and never dealt with
this,
that can detect early when something got backed up when it shouldn't?
While possible, these just sound like more alerts that people are not
going to be very interested in. For those who do use the backups
service, do you feel that a simple percent used alert isn't good
enough and you need to know about rates of change?
But in any case, the most reasonable thing to do
for me is to abort the
next backups until there is free space.
I'm not sure that is reasonable, and I will explain why below..
Note that
although "just suspend the customer's backups as soon as
they go past 100%" initially sounds like a good idea, it may not be
as it prevents the customer from removing whatever it was they
backed up that they didn't mean to, i.e. fixing it themselves.
Sorry, don't follow you here :-S
The backups are incremental. They aren't just X amount of files
times Y backup points. It's X amount of files plus the amount of
changes over a configurable time period that in the default case is
6 months but some people have it set to 12 months or more.
The default backup schedule looks like this:
- Once every four hours, keep 6.
- Once every day, keep 7.
- Once every week, keep 4.
- Once every month, keep 6.
This means that (without you contacting support to ask for stuff to
be deleted out of backups), once a file is backed up, it isn't going
away for 6 months. Even if you delete it off your disk.
e.g., you create:
/var/tmp/dvd_rip
of 8GB or whatever and it gets backed up, so it's now accessible
via:
/srv/backups/hourly.0/var/tmp/dvd_rip
Noticing your backup space usage went up by 8GB you delete
/var/tmp/dvd_rip or otherwise mask it from being backed up.
The file doesn't disappear out of your backups though. At the next
run it'll be accessible as:
/srv/backups/hourly.1/var/tmp/dvd_rip
and tomorrow it will be:
/srv/backups/daily.0/var/tmp/dvd_rip
and so on.
By now you're probably wondering where I am going with this since it
doesn't explain how a customer can take some action to reduce the
space their backups use, in fact all I have done is explain how a
customer CAN'T fix it.
Well the thing is that at every backup run the oldest iteration is
being deleted, so on the 6th daily run hourly.5 is being deleted and
on the 6th monthly run monthly.5 is being deleted.
Therefore if you identify things that have been backed up for a long
time but which don't actually need to be, you can delete them from
your disk or else mask them from being backed up, and as they age
out they won't take up disk space any more.
An example might be the files in /var/log/ which change all the
time so at every hourly run you will back up a new set of them. If
you decide that you don't want a backup of them every four hours
then you might mask them from being backed up. This will have
immediate effect with the next backup run since that is a set of
logs that got aged out of hourly.5 and never appeared in hourly.0.
I do take your point though, because there is nothing stopping
anyone doing the above well before 100% is reached. What I just
described is also a fairly rare case - normal cause of suddenly
going past 100% is mistakenly letting some big transient file be
backed up, and there's currently no way for the customer to fix that
by themselves.
At the moment there is no negative effect from going past 100%
except that I will write to you and ask you to sort it out. So I
could be wrong about suspending their backups being an unreasonable
thing to do.
I had suggested the option of "you will automatically order more
disk and be charged for it" as one possible negative consequence,
and it appeals to me because it's very simple!
You suggest an alternative negative consequence of "once usage goes
above 100%, suspend backups". That is also fairly simple, and has
the advantage that no one gets a bill that they don't intend to pay.
It has the downside that the customer's backups now will never get
re-enabled unless they contact me to buy more disk or ask me to
delete things.
Which of these makes the most sense?
Should both options exist for people to choose between?
If I implemented a way (from the Panel) to nuke the most recent set
of backups then would that make the "suspend" option the best one as
the customer can still fix it themselves?
That is, upon receiving the critical alert that they had now used
more than 100% of backup space and their backups have been disabled,
they determine that this is because something large got backed up
that shouldn't have been backed up. They could then go to the Panel
and delete the most recent backup run, and then backups start
working again at the next run, all of this without needing to submit
support tickets and without being charged any extra.
Now I have typed it out, that does sound rather more friendly than
sending people bills.
Cheers,
Andy
--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
users mailing list
users(a)lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/users