Issue 032
Hey folks, welcome to another installment of Monitoring Weekly! Did you write something about monitoring recently? Maybe got an idea rolling around in your head? Send it on over and let the community learn from you. :D
Monitoring News, Articles, and Blog posts
Google Cloud Platform Blog: Building good SLOs - CRE life lessons
If you’ve read the SRE Book, you’re probably familiar with SLOs (Service Level Objectives) already (and if you haven’t read the book, it’s a good read). This article is a simplified version of what you’ll read in the book, but I do quite like how straight-and-to-the-point it is. My favorite reminder from this article: you need to retain service availability data long-term in order to effectively report on it and determine if you’re improving or not.
Infrastructure Monitoring with TICK Stack
I don’t know why, but I’ve not seen an article that walks through the setup and use of the full TICK stack before this one. Most only talk about using pieces of it (usually Telegraf and/or InfluxDB), so this is especially interesting to see how they all fit together.
Comparing 10 Container Monitoring Solutions for Rancher
Looking for the right solution for monitoring your Docker-based infrastructure? The folks at Rancher have updated their assessment, looking at ten different solutions and the pros/cons of each.
Metrics: not the observability droids you’re looking for
High-cardinality of metric data has always been sort of a pain point. If you’ve ever been on the unfortunate end of having someone encode a UID into a metric path and watching it bring a metrics server to its knees, you know how tough it is. This article makes the case that high-cardinality isn’t just desirable–it’s required. And while we’re at it, we should all stop assuming it’s impossible to achieve.
Using Check Hooks – The Sensu Blog
I know a pretty good number of you are Sensu users, so you’ll be interested in this latest new feature in Sensu Core 1.1: Check Hooks. They allow a client node to execute an arbitrary command on the client itself based on the exit code of the check. This opens up all sorts of possibilities: basic auto-remediation a la Monit, inclusion of context information a la nagios-herald, and more.
Full disclosure: My company, Aster Labs, is a Sensu Partner. I received no consideration, financial or otherwise, for including this post.
RichiH/OpenMetrics: Evolving Prometheus exposition format into a standard.
One of the most frustrating problems we’ve encountered during the rise of metrics has been the lack of an expressive standard spec. Graphite’s dot-delimited spec is quite limiting (such as forcing you to encode dimensions in the metric name). statsd solved some of those issues but certainly has its own problems. I’m a little bummed that Metrics 2.0 didn’t get widely adopted. All that said, there’s a new kid on the block: OpenMetrics. It’s still very much in the early days of the spec, but I’m liking what I’m what I’m seeing so far.
See you next week!
– Mike (@mike_julian) Monitoring Weekly editor