Issue 100

Did you know I have a podcast too? Check it out: Real World Devops

This issue is sponsored by:

SignalFx logo Monitor What Matters Most and Diagnose Anomalies in a Matter of Seconds

When it’s time to troubleshoot an issue, are you providing the right monitoring signals to your team? SignalFx APM helps by providing full distributed tracing, anomaly detection, and predictive analytics – all right out of the box.

Latest on monitoring.love

I’ve got some great news: I’m joining forces with Corey Quinn over at Last Week in AWS / Screaming in the Cloud. You can read the announcement right here.

From The Community

Take DORA’s Accelerate: State of DevOps Survey

Definitely the best and most interesting survey in our little slice of tech. You should take it. If you want some reminder of what’s in store, have a look at last year’s report.

Linux Kernel Observability through eBPF

There are two kinds of people in the world: those who love eBPF and those who haven’t used it yet.

Amazon is going for Elastic’s throat or Elastic is doing some hinky stuff to their codebase–depending on who you ask This week’s tirefire started with AWS’s announcement of the Open Distro for Elasticsearch. There’s been some fun commentary on it on Twitter, various Slack groups, and other vendor blogs (eg, AWS intends for their new project to be an Elasticsearch fork - InfluxData). Depending on who you ask, this is either AWS going after one of its partners (again) or an inevitable result of Elastic intermingling proprietary code with open-source code. Elastic had a great public response, though.

Postmortems Part 3: Getting the Most out of Your Postmortem Meetings

From the article: “The goal of the postmortem meeting is to deepen understanding of incident causes and get buy-in for action items so that they actually get done.”

Datadog Log Management from Zero to One

Though they’re using Datadog to show the data, this is actually a pretty cool article about queuing theory and performance analysis. From the article: “In this blogpost, I demonstrated how Queueing Delay influences Response Time under a high arrival rate on a single-threaded application server. Tracking both Service Time and Queueing Delay is necessary for capacity planning and performance modelling.””

Setting up comprehensive centralized logging with AWS Services for Kubernetes

Linking this mainly because it just introduced me to a new tool: Collectord.

Structured Logging: The Best Friend You’ll Want When Things Go Wrong

The folks at Grab are hear to preach the good word of structured logging. <3

Crafting a Resilient Culture: Or, How to Survive an Accidental Mid-Day Production Incident

The fantastic Ryn Daniels relates a story of an incident from their time at Etsy, complete with misbehaving Chef, unexpected systems behavior, and what they learned from it.

npm On-Call

The folks behind npm walk us through what their on-call process looks like.

This issue is sponsored by:

Scalyr logo We’re hosting an online workshop on Tue 3/26 at 9:00am PT on building, deploying and monitoring containers. Sylvia Fronczak (Software Engineer) and Dave McAllister (Scalyr Community Guy) will show live code and examples to accompany container orchestration concepts. They’ll also show how to get started with monitoring containers. Sign up for the online workshop.

Events

LogicMonitor Level Up Conference - June 24-26, 2019 - Austin, TX USA

For the LogicMonitor fans among you, the LogicMonitor Lever Up conference is this June. I’ll be speaking, so come hang out/heckle!

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor