Some fun and interesting articles this week. Loving the emphasis on practical monitoring and troubleshooting practices. And flame graphs! 🔥📈🔔
This issue is sponsored by:
Regardless of where you are on your incident management maturity journey, there’s a right next step you can take. Learn about three areas of focus — roles, services, and retros — why they’re important, and how to improve at any level in "3 ways to improve your incident management program in 2023."
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
I absolutely love debugging stories like this one from Airbnb working through some Istio performance issues in production. 🤩
Probably one of the more honest appraisals of OpenTelemetry’s strengths and areas for improvement.
This engineer not only created a Grafana dashboard for monitoring the OpenTelemetry collector, but they wrote up a page with diagrams explaining the flow and respective metrics. Great work!
One company’s journey planning for a new Observability platform, going through the usual considerations like build-vs-buy, open source versus commercial, and how to ensure it could scale with their growth.
Unstructured logs continue to be a pillar of Observability for many companies, but they can be so much more. This post shows how a bit of planning can yield richer data using structured events.
A look at one of the more common use cases for push metrics in a Prometheus-monitored architecture.
I think most folks here are probably aware of this feature, but it’s a great thing to share with anyone who might be newer to Prometheus and Alertmanager.
A broad set of practices for troubleshooting AWS services. Most of this is probably a refresher, but it’s a good reminder to make sure you have your processes and strategies up to date.
A medium severity security fix affecting the use of Grafana’s Graphite data source.
The folks at Sysdig take a closer look at a couple common Kubernetes container errors and how you can monitor them effectively.
“Singed makes it easy to get a flamegraph anywhere in your code base.”
Monitorama has announced their full agenda for this year’s event. Looks like an awesome collection of topics and speakers. Hope to see you there!
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor