Issue 100
Did you know I have a podcast too? Check it out: Real World Devops
This issue is sponsored by:
Monitor What Matters Most and Diagnose Anomalies in a Matter of Seconds
When it’s time to troubleshoot an issue, are you providing the right monitoring signals to your team? SignalFx APM helps by providing full distributed tracing, anomaly detection, and predictive analytics – all right out of the box.
Latest on monitoring.love
I’ve got some great news: I’m joining forces with Corey Quinn over at Last Week in AWS / Screaming in the Cloud. You can read the announcement right here.
From The Community
Take DORA’s Accelerate: State of DevOps Survey
Definitely the best and most interesting survey in our little slice of tech. You should take it. If you want some reminder of what’s in store, have a look at last year’s report.
Linux Kernel Observability through eBPF
There are two kinds of people in the world: those who love eBPF and those who haven’t used it yet.
Amazon is going for Elastic’s throat or Elastic is doing some hinky stuff to their codebase–depending on who you ask
This week’s tirefire started with AWS’s announcement of the Open Distro for Elasticsearch. There’s been some fun commentary on it on Twitter, various Slack groups, and other vendor blogs (eg, AWS intends for their new project to be an Elasticsearch fork - InfluxData). Depending on who you ask, this is either AWS going after one of its partners (again) or an inevitable result of Elastic intermingling proprietary code with open-source code. Elastic had a great public response, though.
Postmortems Part 3: Getting the Most out of Your Postmortem Meetings
From the article: “The goal of the postmortem meeting is to deepen understanding of incident causes and get buy-in for action items so that they actually get done.”
Datadog Log Management from Zero to One
Though they’re using Datadog to show the data, this is actually a pretty cool article about queuing theory and performance analysis. From the article: “In this blogpost, I demonstrated how Queueing Delay influences Response Time under a high arrival rate on a single-threaded application server. Tracking both Service Time and Queueing Delay is necessary for capacity planning and performance modelling.””
Setting up comprehensive centralized logging with AWS Services for Kubernetes
Linking this mainly because it just introduced me to a new tool: Collectord.
Structured Logging: The Best Friend You’ll Want When Things Go Wrong
The folks at Grab are hear to preach the good word of structured logging. <3
Crafting a Resilient Culture: Or, How to Survive an Accidental Mid-Day Production Incident
The fantastic Ryn Daniels relates a story of an incident from their time at Etsy, complete with misbehaving Chef, unexpected systems behavior, and what they learned from it.
The folks behind npm walk us through what their on-call process looks like.
This issue is sponsored by:
We’re hosting an online workshop on Tue 3/26 at 9:00am PT on building, deploying and monitoring containers. Sylvia Fronczak (Software Engineer) and Dave McAllister (Scalyr Community Guy) will show live code and examples to accompany container orchestration concepts. They’ll also show how to get started with monitoring containers. Sign up for the online workshop.
Events
LogicMonitor Level Up Conference - June 24-26, 2019 - Austin, TX USA
For the LogicMonitor fans among you, the LogicMonitor Lever Up conference is this June. I’ll be speaking, so come hang out/heckle!
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor