I hope you don’t mind another “gift”, because it’s time for our “Best of Q4” issue! I’ve gone back over the past few months and pulled out the most popular articles as chosen by you… Enjoy! 🎁🎄⛄
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
I really wish this primer on Grafana Loki existed four years ago when I was making a case for our next logging platform. At the time, the maintainers struggled (imho) to make a compelling case for why Loki was different (besides “Prometheus labels!”) or why I should clear. It’s pretty clear the project has found its audience since then and I’m happy to see continued competition in the space.
Speaking of cardinality, here are some tips and examples for tracking down and mitigating the sprawl of high-cardinality metrics in your Prometheus cluster. Love this.
Don’t worry George, you weren’t the only one who missed this. Nice catch.
As someone who largely broke into the monitoring space through a love of TCP/IP networks and troubleshooting, this article sings to me. 💘
I think we all deal with on-call so much we forget that not everyone is experience with it (or knows how to do it well). An excellent article looking at the spectrum of responsibilities and guardrails to ensure are in place to optimize you and your team’s on-call rotation.
A C-suite worthy analysis of the Observability landscape. If you’re trying to make a case with your leadership for dedicated resources, this could be a good article to share with them.
Another strong case for OpenTelemetry; not just for its technical capabilities and ability to avoid vendor lock-in, but also for reducing per-node vendor costs. Happy to see these stories are becoming more common every day.
I really enjoyed reading this post and appreciate the author’s honesty, but imho this situation could’ve been easily avoided. I would also encourage them to be mindful of their next technology decision because it sounds like history might repeat itself.
In my experience, Alertmanager tends to be one of those tools you learn through shadowing and tribal knowledge. This post cuts through a lot of that and goes beyond basic setup tips to demystify some of the less obvious aspects of using it in real scenarios.
How to manage the sprawl and maintain discoverability of Grafana dashboards and data is a common theme for most organizations. This post introduces a pattern with the Grafanalib library that sounds like a good option for many.
A framework for writing postmortems, with templates, some solid references, and sample incidents. Even if you already have a solid incident response program in place, you might pick up a few tips.
An excellent technical deep dive into Kubernetes behaviors around CPU requests, limits, and how its design abstractions affect the way we use it.
A creative example for leveraging some of the OTel Collector’s less obvious capabilities.
Organizing and managing alerting rules can be a major hassle as your teams and architecture grows. This post demonstrates a pattern for decentralizing ownership of your alerts using GitOps.
Frankly, I wanted to include this post from Uber Engineering mostly for the absolutely gorgeous visualizations. There’s also some pretty interesting talk about their load shedding architecture works and demonstrations of its efficacy.
“Oncall is a calendar tool designed for scheduling and managing on-call shifts”
A handy primer on Kubernetes probes and how to make the best use of their respective states.
Lessons learned adopting distributed tracing (and its effects on the rest of their observability stack) inside a Platform team at Adidas Group.
How to configure a relatively modest Prometheus federation for monitoring multiple Kubernetes clusters. However, I’d caution you to be prepared to start looking at other solutions as your complexity and scale grows.
I honestly never gave much thought to the differences between flame charts and graphs before reading this article. A pretty handy guide for understanding when to use each visualization.
Kids these days and their fancy orchestrators and ephemeral runtimes. Learn you some basic Linux debugging commands and the world is your oyster.
See you next
– Jason (@obfuscurity) Monitoring Weekly Editor