This week has some serious “where are we and how did we get here vibes”. From dashboard design to telemetry and tracing collection to the history of observable systems, we’ve got a bit of everything. Enjoy! 📈📚🍻
This issue is sponsored by:
Ready to kickstart your incident response improvement efforts in 2023? Join FireHydrant on Wednesday, Feb. 8 for a webinar on How to evaluate and improve how you manage incidents. Learn what metrics you should monitor, common benchmarks, and how to show improvements and prove ROI.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
I “grew up” in this industry cutting my teeth on dashboard design and usability. Tools like Grafana make this a lot easier than it used to be, crafting your own charts and pages usind D3.js. Still, it can be almost too easy to vomit a bunch of graphs on a monitor and call it a day. This article does a good job calling out the design considerations that will turn the dashboard into a truly useful resource for your team.
Another examination of dashboard design, this time with an emphasis on the telemetry and signals used to inform our dashboards and the responders who rely on them.
If you’re not already leveraging logs in your Observability story, this author has a bone to pick with you. Seriously though, this is a solid look at why structured logging can help you surface more insights from your systems.
Really appreciate when an engineer works through a complex problem and shares their solution publicly. I learned a lot more about ElastAlert (and a little Scala) than I expected, tbqh.
We haven’t seen many distributed tracing stories lately, so it’s refreshing to include a guide for setting up OpenTelemetry spans with .NET projects.
Wait, didn’t I just say… oh well, here’s another distributed tracing example. This time for using OpenTelemetry with Golang and Python services. 😂
If you enjoyed the OpenTelemetry articles above but haven’t taken the plunge for yourself, this article provides a thorough comparison of the major players in open source distributed tracing backends.
MetricFire is a hosted monitoring solution that allows you to get the data you need. We offer an easy-to-use product with beautiful open-source dashboards and enhanced alerting. Using a tried and true Graphite infrastructure, MetricFire is the fastest and easiest way to monitor your metrics. Get started for free or book a demo here. (SPONSORED)
This post touches on a variety of incident-related topics without taking itself too seriously. If you enjoy this one and consider yourself new to SRE topics, I’d recommend hopping over to Google’s site and reading the free online copies of their SRE books.
I love that this author somehow managed to squeeze concepts and details from a dozen different references into one cohesive story. Breeze through this one and then allow yourself to dive into the list of references at the end.
A top-down review of SLOs, error budgets, and a variety of tools to help your teams manage them.
A look back on the types of telemetry sources that have influenced how we think about Observability in modern software systems.
New Grafana 9.3.x and 9.2.x releases to address high and medium severity CVEs.
This is the final week to submit talks for this year’s Monitorama PDX 2023 event. Get your proposal in by the Feb 3 deadline! 🤩
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor