This issue is sponsored by:

Panopta logoResearch Report: How Can You Tell If Your DNS Is Completely Secure?

81% of the world’s busiest domains are open to outages because of poor DNS setup. See how yours compares across several DNS security measures - Know how you might be in jeopardy - And the three easy fixes to make. Get Panopta’s research report The Perilous State of Global Web Domains in 2019 today.

Latest Articles on

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

DevOps is Dead with James Turnbull - Real World DevOps

Probably the most common question I received when I told people I was writing a book about monitoring was, “Have you read James Turnbull’s book?” I’m putting that to rest with a delightful conversation with James Turnbull on a variety of topics, including which of his own books is his favorite, some not-so-subtle digs at Kubernetes, and why James thinks DevOps is dead.

From The Community

The New Rules of Sampling

Almost certain to start debates in private channels, let’s talk about sampling.

Top Node.js Metrics to Monitor

For the Node folks among you, this should be handy.

Domain-Oriented Observability

“… how do we add observability to what we care about the most, our business logic, without clogging up our codebase with instrumentation details? And, if this instrumentation is important, how do we test that we’ve implemented it correctly?”

My CloudWatch Logs Are How Old?

It’s interesting to me that CloudWatch Logs have a default expiry of “never,” but as someone who looks at a lot of AWS bills, it’s thankfully never at a “oh shit” level. Still though, maybe a good idea to set expiration on them to keep things clean.

The Difference Between Goals, Strategies, Metrics, OKRs, KPIs, and KRIs

I’m always a big fan of taking lessons from outside of tech, so here’s one from the wider world of business strategy.

Recapping Datadog Summit Seattle 2019

In case you couldn’t make it to the Seattle Datadog Summit, Datadog has published the talks. There’s some great stuff in here; I really liked the one at the bottom about SLIs and SLOs.

Comcast/kuberhealthy: Easy synthetic testing for Kubernetes clusters.

I’m having trouble reconciling how the #15 most hated company in America has such great technical staff, but here we are.

monzo/response: Monzo’s real-time incident response and reporting tool

This is a super handy tool to making incident management in Slack less painful. Be sure to watch the video too, which goes into some detail about the thoughts behind it and how Monzo uses it.

vassilevsky/sidekiq-influxdb: Writes Sidekiq job execution metrics to InfluxDB

After my mention of the Sidekiq monitoring article last week, a reader was kind enough to send me a tool they wrote that makes it even easier. Enjoy!

Failure is Familiar, Safety is Surprising

Oh man, so good: “Success is invisible. That is, the work that goes into creating the conditions for success can be difficult to describe or see. It is driven by our expertise and collective tacit knowledge. This seems a paradox, that we could be successful yet not fully understand the factors that contribute to things going “right”.”

Building a Culture of Observability within your Organization

Based on years of consulting on this exact topic, I’ve found that better observability is rarely a technical problem. This video is great and should help you get more people on board with the idea.

Nines are Not Enough: Meaningful Metrics for Clouds

The folks at Google are presenting this paper later this month and it’s a fascinating read. From the paper: “We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We argue that a mutually beneficial set of Service Level Expectations (SLEs) and Customer Behavior Expectations (CBEs) ameliorates many of the problems of today’s SLOs by explicitly sharing risk between customer and service provider.”

This issue is sponsored by:

Blue Medora logoMonitoring on-prem infrastructure with…StackDriver?!

Yes, it’s a thing! Blue Medora helps you integrate your on-prem infrastructure and your cloud infrastructure into one place. Rather than making your users learn yet another monitoring tool, Blue Medora acts as a bridge, transparently shipping metrics from your datacenter hardware to monitoring tools of your choice.


Monitorama Baltimore 2019 - October 21-22, 2019 - Baltimore, MD USA

Yes, you read that right: Monitorama is doing a new event on the American east coast! I’m super excited.

Monitoring & AIOps Meetup - May 22nd, 2019 - Mountain View, CA USA

I’m emceeing this meetup in Mountain View later this month, with two awesome speakers: Stefan Apitz and J Paul Reed. You should come out for it!

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor