Hi,
Thanks to suggestions from the beta testers I've done more work on
the Grafana setup. It's probably still not quite ready yet but could
do with a bit more testing so I've opened it up to everyone now.
So please if you have a few minutes can you take a look at:
https://tools.bitfolk.com/grafana/
I've made it auth off of the Panel, so there is no need to enter
credentials at multiple places any more. It should redirect you
around as necessary.
CPU graphs in Cacti at the moment show as % of a core, so if you
have 2 vCPU (as all BitFolk customers do) then the graph can show as
high as 200%. I copied that behaviour into Grafana but feedback was
that people wanted it to be % of total CPU, i.e. strictly between 0
and 100%. So I did that.
The traffic direction is also now presented from the perspective of
your VM - Cacti shows from the perspective of the BitFolk bare metal
host so the label for "In" means "Out of your VM". I thought it
might be less confusing to flip those directions from the start for
this.
If you have multiple VMs then there will be a dashboard for each
one. Click the icon on the left that looks like four squares and
then "Manage" to see them all.
I had feedback from one person that graphs with negative Y axis are
less clear, but then the few other testers who got back to me on
that said they don't see a problem with it. This was however before
getting chance to compare and also it's still a really small sample
of people. So I've added alternate dashboards that don't use a
negative Y axis. I'd like some feedback on which of the two you
prefer.
Whichever of those is most preferred I will leave as default, but
will probably still keep the alternate.
And of course, any more feedback about how it looks and is laid out
would be welcome. Also other feedback about what more could be done,
though as I say initially I am only trying to replace and retire
Cacti.
Mini FAQ (will go into a wiki page at some point):
Q. Will you be importing the 5+ years of data from Cacti?
A. Sadly not. Prometheus doesn't have that sort of input feature
(yet):
https://github.com/prometheus/prometheus/issues/535
I will keep the data around and visible, but it won't be updated.
If it later becomes possible to import then I'll try to do that.
Q. How much retention will there be in Prometheus?
A. The default setting is just 14 days, but I've optimistically set
it to 1 year. Depending on how it goes that may need to be
shortened or it might be fine.
I understand that external [to Prometheus] storage can be used
with something like Thanos keeping the older data while
Prometheus only deals with the more recent:
https://improbable.io/blog/thanos-prometheus-at-scale
So when the time comes I'll look at switching to that sort of
thing, but it doesn't need to be dealt with just yet.
Q. Will you be allowing it to graph other things than the dashboards
you've supplied?
A. Maybe. The offer was always there for Cacti, but there is
currently only one 1 customer making use of that. Conceivably if
you know what you want graphed we can come up with a dashboard
that shows it. This may require you to install an agent of some
kind (e.g. node_exporter).
Full edit access to BitFolk's Grafana is not likely to happen any
time soon.
Q. Will you expose Prometheus metrics related to our VMs so we can
scrape them ourselves?
A. Probably not any time soon. The current metrics mostly come
straight from node_exporter on our bare metal hosts, they're
mixed all together with other customers and infrastructure
metrics. The access control would be a lot of work.
Q. Can I share links from Grafana?
A. You can explicitly share snapshots of whatever you are looking
at. There's a "Share dashboard" icon on the top towards the
right. It offers you the option of sharing a "Link", "Snapshot"
or "Export".
"Snapshot" is the one you want. The resulting link will let
people see whatever you are seeing. It won't update.
"Link" would show people an interactive dashboard, but its only
available to authorised users, which basically means you and the
admin account, so not very useful.
"Export" is for getting the JSON definition of the dashboard out,
and since you don't have access to the metrics is again not very
useful for you.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting