Lots of great “in the trenches” stories from a variety of engineering teams out there. Speaking of teams, we’ve got a stack of job postings this week… who’s looking for a new gig?! 💻📈💰
This issue is sponsored by:
What are the 3 trends in cloud-native and observability you need to know?
Tune in for an on-demand discussion with Chronosphere and analyst group ESG as we talk about the market challenges with cloud-native and observability strategies. You’ll learn the cloud-native adoption benefits and challenges, observability impact on business outcomes, and much more. Register here!
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Notes on an Observability Team
This article speaks to me on a very personal level. I’ve built up Observability teams over the years; it’s not surprising that we share many of the same problems, but it’s always interesting to hear how we tackle (or prioritize) them differently.
Thoughts Over an Annoying Production Issue
Always interesting to read how other engineering teams work through really frustrating incidents.
Monitoring CPU performance of Lyft’s Android applications
A look at how Lyft instruments their Android mobile app to track CPU usage and monitor for performance regressions.
How we avoided alarm fatigue syndrome by managing/reducing the alerting noise
How Doctolib audited and continue to iterate on their noisy alerting behaviors.
What Is Log Aggregation: 101 Guide to Best Tools & Practices
A quick introduction to log aggregation concepts along with a fairly objective comparison of numerous commercial and open source alternatives.
Everyone’s favorite open source dashboard is out with another new release. Love to see the new alert grouping features.
API Observability with Apache APISIX Plugins
If you’re using Apache APISIX already, there are a number of Observability plugins at your disposal. This article brings together a wealth of resources for getting started with the usual observability pillars (metrics, logs, and traces), and how to integrate them within your existing toolset.
CrashLoopBackoff + Four Other K8s Troubleshooting Tips Everyone Should Know
We all love Kubernetes but it can be a hassle to fix when things go sideways. In this webinar, we will cover some of the common problems that plague every Kubernetes user and show you how to fix them. Join us at 10am PT on Thursday, April 28 to add these tips to your troubleshooting toolbox. Save your seat here. (SPONSORED)
An Effective Incident Escalation Process of Sendoso
Great to see more companies talking about their incident management process publicly.
Google Cloud Monitoring: What You Need to Monitor and Why
A helpful guide for friends or peers who might otherwise be new to monitoring on Google Cloud.
Improve observability using Stackdriver metrics programmatically
If you’ve been wanting to pull metrics out of the Google Cloud Monitoring API, this article has you covered. Props to the author for including a GitHub project with examples.
Create Monitoring & Alerting for Webhook Errors using Datadog
A look at Xendit’s pattern for monitoring outgoing webhook failures.
Monitorama PDX 2022 - June 27-29 (Portland, OR)
Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
Principal SRE- Logging, Metrics, and Monitoring at athenahealth (US Remote)
Lead Developer- Cloud Infrastructure Engineering at athenahealth (US Remote)
Software Engineer - SRE at Barracuda (Remote)
Senior Software Engineer - SRE at Barracuda (Remote)
Principal Software Engineer - SRE at Barracuda (Remote)
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor