Plenty of variety in this week’s collection, with stories covering everything from logging, metrics and tracing to service levels and runbooks. Oh, and the Learning From Incidents conference released their videos, so make sure to check those out too! 🍩☕📈
This issue is sponsored by:
Regardless of where you are on your incident management maturity journey, there’s a right next step you can take. Learn about three areas of focus — roles, services, and retros — why they’re important, and how to improve at any level in "3 ways to improve your incident management program in 2023."
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
I love this post from one of Intercom’s engineering leaders on how they instrument availability into their organization with a combination of culture, technical foundations, and risk management. Someone please make this into a podcast.
Everyone knows the importance of runbooks, but so few teams invest and care for them properly. Here are some tips to get you started.
I missed most of the Learning From Incidents (LFI) conference last month, but this talk on collaborative incident response between companies really stood out to me. The full playlist is available for viewing here.
A comparison of traditional OpenTelemetry span relationships versus span links, in the context of debugging a message bus. Good stuff.
I’m pretty skeptical of industry surveys but they generally yield some interesting data. This one is no different in either regard, but I wish they’d be more transparent about how the survey was conducted.
Service Levels are a key concept for Site Reliability Engineering, and we use them for understanding the health of any service. This post offers a quick primer on each, with some brief but helpful examples.
Are you tired of paying too much for your monitoring solution?
Expensive, name-brand monitoring and trying to self-host are both a pain. We believe monitoring should be accessible for all developers, and our simple, low-cost pricing reflects that. For more information on how MetricFire can save you money, check out our pricing page and reach out for a quote about how much you could be saving. (SPONSORED)
I’m new to Pyroscope, but this looks like an interesting project for continuous profiling of your application.
Monitoring the health of any time-series database is crucial, and InfluxDB is no different in this regard. This admin shares a quick tip for ensuring you have access to its internal statistics.
This article goes well beyond the titular subject, diving into a range of observability related topics. I would have appreciated a less biased look at pull versus push (there are strong advantages for the latter), but it’s an informative read nonetheless.
Autoscaling is one of those things that always sounds better in theory than in practice. Regardless, I love having it in my toolkit, and with this guide you can have it scaling based on your Prometheus metrics in no time.
A comprehensive comparison of five of the most popular logging frameworks for Java applications.
“Pyroscope is an open source continuous profiling platform.”
Monitorama is returning to Portland, OR next summer. The 2022 conference was a fantastic event and I look forward to seeing you all again in 2023.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor