Some exciting announcements and discussions this week. Big news for Kubernetes container probes, a new book on Observability, some fresh takes on retrospectives, and much more. Enjoy! 😍📖🔥
This issue is sponsored by:
Say goodbye to manual evidence collection and hello to automated compliance. Drata, G2’s highest rate cloud compliance software, offers 60+ integrations that seamlessly connect with your various tech stacks used to manage compliance across your organization. Monitoring Weekly readers get 10% off Drata here.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Lessons learned from a QA engineer into how Observability has empowered them to be more active in troubleshooting and provide more useful feedback for developers.
A look at the role of logs in supporting production workloads along with some helpful tips and best practices to make them even more effective.
A fun post from Pinterest about performance and efficiency gains in their caching infrastructure made possible by their monitoring data.
Great to see gRPC probes finally beta status in Kubernetes 1.24, meaning they’re now available by default. This should be a big win for health-check and liveness probes of your clusters.
We talk a lot about OpenTelemetry framework in terms of traces and spans, but it provides enormous value in the form of metrics as well. This post is an excellent guide at what makes OTel metrics unique, how to set them up, and when to use the various types.
O’Reilly’s new Observability Engineering book has been released, and Honeycomb has made the entire eBook available as a free download. Looking forward to reading this one soon.
Spotify engineers performed a retrospective on all of their incidents from 2021 in an effort to understand how they performed overall, and to uncover any missed learning opportunities. Good stuff, and something I wish more companies provided the space to emulate.
incident.io has joined your #general Slack channel.
👋 I'm here to sponsor this issue and automate your entire incident management process in Slack. You just focus on fixing the issue, I'll keep your team and status page updated, nudge you to take the important actions, escalate to the right person when needed, auto-generate your post-mortem and make sure follow-up actions are taken care of.
Install incident.io to your Slack, type /incident and I'll take care of the rest.
incident.io has left the chat. (SPONSORED)
Databases can benefit from observability as much as all the other services we run in production. With the help of OpenTelemetry we can surface hidden dependencies that might otherwise be difficult to uncover or debug.
Some tips and caveats to watch out for when managing your postmortems.
Facebook engineers have developed an internal framework allowing them to isolate entire infrastructures into “vacuum-sealed” sandboxes, suitable for resiliency and recovery experimentation. It doesn’t sound like this project will be released as open source, but it serves as an interesting case study and pattern for developing our own pre-production environments.
A recap and summary of the lessons learned from Atlassian’s extended outage.
Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
OSMC is back again for 2022, taking place November 14-16 in Nuremberg. CfP submissions are being accepted through July 31, 2022.
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor