With the most recent Monitorama behind us, it feels like a great time for our quarterly “best of” issue! We have some fantastic articles here covering the most popular topics and themes from the past few months. Enjoy!
This issue is sponsored by:
In a single afternoon I set it up then ran 500 deployments, a dozen different ways…
When was the last time you ran one deployment without a hiccup, let alone 500? Learn how declarative deployment with a GitOps experience makes all the difference with Armory.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Reading about how others think about alerts (and the failures of bad alerts) is something I’ll never get tired of, and unfortunately, something I think we’ll never really master as a discipline. Still, it’s important to share our learnings and continue evolving our practice.
How one team profiled their Prometheus metrics usage and found a massive storage savings win.
This is a bit of a one-off, but it reminds me that monitoring systems can be a very complicated beast. Easy diagramming is a win for everyone, admins and users alike.
A brief look at the history and differences between two popular open source Observability agents.
Kubernetes does a good job of managing resources, but it’s naive to think we won’t need to troubleshoot it like any other system from time to time. And knowing how to debug something is the first step to monitoring it effectively.
I’m a little sad that this tool needs to exist (I wrote something similar for Graphite a decade ago?) but it does address a valid need. Worth adding to your sack of monitoring tools.
I don’t believe that traditional instrumentation approaches are going anywhere, but eBPF is making a strong case for less intrusive metrics collection.
A recap from KubeCon with a particular emphasis on how OpenTelemetry continues to dominate observability instrumentation and what might be next.
Some lessons learned for reducing custom metrics usage with Datadog’s “Metrics without Limits” feature. Woof.
A vendor-agnostic look at what Observability really means in a systems context. Useful for SRE folks and anyone else who cares for production services but might not actually be developing the services themselves.
An overview of Expedia’s most important operational metrics across a variety of use cases and service types. If you’re a technical leader in your group, it might be a fun exercise to review these with your team.
A fascinating look at Pinterest’s anomaly detection platform and the algorithm choices they’ve made in its design.
An excellent look at the state of distributed tracing, acknowleding the pains that we’ve experienced up to this point, and some thoughts on where the discipline might be heading.
As a fan of push-based metrics collection, I’m not sure I buy into the rhetoric here, but this is a very good look at Prometheus’ strengths and how to use its multitude of features.
OpenTelemetry already has a strong reputation for portability but this example really underscores just how easy it is to switch your final destination(s) using OTel collectors.
There are some valid frustrations here, but I’ve been in this industry for a long time and literally every piece of software is going to cause heartache sooner or later. Still, I encourage everyone to read this and go make a positive impact where you can.
An engineer at tb.lx talks about their adoption of observability practices and tooling, and how it’s leading to better outcomes for not just their internal teams and business outcomes, but for better understanding customer issues.
I love this story from a ZipRecruiter engineer, recalling their path from Icinga and Graphite monitoring to adopting Prometheus for their Kubernetes infrastructure. Teams considering a similar adventure would do well to heed these learnings.
One company’s take on running their observability stack (Grafana, Loki, Tempo, and Prometheus) on Kubernetes.
A hot take on dashboards. Sorta. I don’t really get the argument that single pane dashboards are good or bad. Any dashboard is only as good as the effort you put into it to make it answer the questions that are relevant to your needs.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor