Had a great time last week at Monitorama watching all the talks and seeing so many familiar faces. Feels serendipitous to come back and discover so many stories this week about production incidents, learning from our mistakes, and more. Enjoy! 🌞🍹📈
This issue is sponsored by:
Can you rely on your deployments?
In a recent Armory and Gartner report, 35% of respondents’ top pain point with app deployment is reliability and consistency. If you need help with consistent, reliable deployments, try Armory Continuous Deployment-as-a-Service. Check out more in the reports here.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
A genuine look at observability and its impact on our work from the perspective of a web developer.
This post is sincerely interesting; it starts off almost as a chapter from a novel before pivoting hard into thoughtful considerations for crafting effective alerts.
Most companies I’ve seen struggle with quantifying the impact of an outage. Props to this HelloFresh engineer for sharing how they model incidents and derive actionable insights.
We’ve all been there… that moment of realization that you just did something very, very wrong and there’s no way to take it back (in my case, an errant
rm -rf / at an OpenBSD hackathon). Still, this is how we learn from our mistakes and build more resiliency into our systems.
Maybe I’m biased because I’ve used other time series query languages (Graphite, Librato, etc) for many years before Prometheus came along, but I agree… PromQL can be a hassle to master. This post explains why it can feel that way and introduces a new open source project to help make it easier.
This might be a bit of a niche concern for our audience, but if you happen to be applying machine learning to your time series data, you’ll probably appreciate reading how Etsy stumbled across some potential issues.
Preach, we should always strive to learn from (and avoid reoccurences of) our mistakes.
Some handy tips (and explanations for why they matter) for improving your Loki queries.
A first look at SkyWalking’s new continuous profiling capabilities.
“Autometrics uses instrumented function names to generate Prometheus queries so you don’t need to hand-write complicated PromQL.”
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor