Issue 167

A fun week of troubleshooting stories and some guides for automating your monitoring and observability tooling. Oh, and don’t forget that SLOconf’s virtual event happens this week. Enjoy!

This issue is sponsored by:

incident.io logo

incident.io has joined your #general Slack channel.

👋 I'm here to sponsor this issue and automate your entire incident management process in Slack. You just focus on fixing the issue, I'll keep your team and status page updated, nudge you to take the important actions, escalate to the right person when needed, auto-generate your post-mortem and make sure follow-up actions are taken care of.

Install incident.io to your Slack, type /incident and I'll take care of the rest.

incident.io has left the chat.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Distributed Tracing: The Why, What, and How?

One of the better articles I’ve read on Distributed Tracing, with some helpful analogies and context to help newcomers develop a foundational understanding of this key observability principle.

Operation Jumbo Drop: How sending large packets broke our AWS network

Regular readers will know I love a good network troubleshooting story. This one might feel familiar if you’ve ever debugged MTU… with some new twists courtesy of AWS Transit Gateways.

A Practical Guide to App Monitoring with Datadog for Python APIs

How to get started with Datadog monitoring for most Python-based web API services (e.g. Gunicorn, Flask, etc).

How We Solved the Thundering Herd Problem

How Braintree engineers diagnosed, iterated, and solved a thundering herd issue affecting their processor service.

IAC with Google Cloud Monitoring

A useful pattern for automating the creation of uptime checks (i.e. DIY Pingdom) in Google Cloud.

[email protected]

It’s unusual for us to call out one individual’s job change in this newsletter, but Brendan Gregg has had such a profound impact on our industry (with a particular emphasis on performance monitoring and debugging), making this a notable event. I’m very excited to hear that he’s joining Intel and will be continuing his work with eBPF and other open source projects.

Observability Mythbusters: Logs and Metrics Aren’t Enough

I don’t agree with everything the author is presenting, but I laud them for at least considering other perspectives. Feels like bias might be involved in both sides, tbqh.

Chronosphere logo

See how companies like DoorDash are “no longer flying blind” with increased visibility and reliability from Chronosphere’s end-to-end solution. Chronosphere is the only observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. Learn more here. (SPONSORED)

The Observant Developer — Part 1

The first part of an upcoming series, this post explains why Observability matters to non-DevOps engineering teams.

Prometheus 2.35 – What’s new?

Sysdig has started a new series to highlight important changes in Prometheus releases. This is a nice addition for those of us in the community who might not otherwise have time to parse the release notes.

The GitOps Way for Consistent Monitoring

A GitOps-friendly pattern for automating Grafana dashboards.

New in Grafana Tempo 1.4: Introducing the metrics generator

A quick look at the latest Grafana Tempo release, with an emphasis on its ability emit RED metrics by default for traces. Nice update all around.

Tools

kris-nova/xpid

“xpid gives a user the ability to “investigate” for process details on a Linux system.”

Events

SLOconf - Service Level Objective Conference 2022

SLOconf is back again as a virtual event, taking place May 9-12 online. Looks like a lot of familiar faces, looking forward to this one.

Monitorama PDX 2022 - June 27-29 (Portland, OR)

Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!

Job Opportunities

Senior SRE at athenahealth (US Remote)

Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor