Some fantastic articles this week, with a strong emphasis on troubleshooting tips, tools, and techniques. Enjoy! 🔧☕😍
This issue is sponsored by:
What are the potential pitfalls of Prometheus-based monitoring, and how can teams successfully address them? Chronosphere is teaming up with the Co-founder of Prometheus to share the potential roadblocks and discuss important best practices to get the most from your cloud native monitoring. Register for the webinar now.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
An impressive collection of hard lessons learned and precautionary planning steps to take ahead of your next
outage troubleshooting session.
This is a well-reasoned and written article from someone I respect a lot. That said, I don’t agree that it has to be the case for companies with the foresight to avoid painting themselves into a corner. If this does sound like you, please share this post with your peers and reconsider how you’re building your systems. </rant>
Some hard truths about “out of the box” distributed tracing support among service meshes.
If you’re looking for an
inexpensive free alternative to uptime monitoring services, check out this OSS project running on Fly.io. This feels particularly timely given Heroku’s decision to discontinue free dynos.
A look at how Meta evolved their SLO management platform to support annotations for richer context. This is a really nice pattern for informing your on-call teammates, especially between shift transitions.
An interesting look behind the curtain at how Grafana Labs diagnosed an incident in their hosted service. I’ll be honest, seeing the fix makes me glad I don’t have to support that configuration. 😬
Want to make Kubernetes clusters highly available? Adevinta's tech teams achieve high availability while operating eight clusters serving 54k requests per second over 20 tenants. Read more about their internal microservices platform and how they deliver a reliable service for tenants. Blog post here. (SPONSORED)
I’m not sure how common this use case is, but if you’re one of those looking to migrate from InfluxDB to Prometheus, this looks like a huge win.
A friendly reminder to check your Docker logging configuration before it wakes you up at night.
This list of CLI tools is heavy on debugging utilities, with many landing somewhere between “things I use every day” and “things I forgot even existed”.
A high severity fix affecting both Grafana and the dedicated Image Renderer. Please upgrade your installations ASAP.
“… a self-hosted monitoring tool like “Uptime Robot”.”
“A tool to migrate Grafana dashboards based on InfluxDB source to Prometheus source…”
Grafana Labs has announced their next event, taking place this fall in New York. They’re accepting CFP submissions through September 9, 2022.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor