Hi,
We were recently made aware that our Cacti¹ bandwidth graphs for a
particular customer were dramatically different from reality.
On investigation I realised that it was a bug in the Linux kernel
where it wasn't using 64-bit counters for Xen backend network
devices. As a result for readings above ~228Mbit/s the counter was
wrapping twice and reporting incorrect values on an SNMP read (as
used by Cacti).
This is a fairly minor issue because we do not use SNMP counters for
billing. It does mean that if your VPS has ever done more than about
228Mbit/s average in a 5 minute period that Cacti won't be showing
it properly.
The kernel bug has since been fixed but deploying a fixed version
would involve using a self-compiled² backports kernel. I am not
going to do this because I haven't tested it enough yet.
Instead I have identified the 30 or so customers that have ever
recorded that much bandwidth use in the last 12 months and am adding
new bandwidth graphs for you, using 1-minute polling. Also new
customers will have the 1-minute resolution graphs. That should be
safe to about 1.1Gbit/s.
So, if you are looking at Cacti and see you now have two bandwidth
graphs with one cutting off where the other began, this is the
reason why.
I wrote a blog article about this at:
http://strugglers.net/~andy/blog/2017/09/03/when-is-a-64-bit-counter-not-a-…
Cheers,
Andy
¹
https://tools.bitfolk.com/cacti/ - Log in with your usual BitFolk credentials
² We already use self-compiled kernels based on Debian kernel
packages, because some security patches have not yet made it into
Debian's packages. Building the kernel isn't the problem, it's
testing it well enough.
--
https://bitfolk.com/ -- No-nonsense VPS hosting