I hope you had a great week and are ready for an equally great newsletter. This week’s theme is all about resiliency in the face of complex systems. Grab your favorite drink and cuddle up to this week’s collection of articles from across the globe. Enjoy! 💔💾🔥

This issue is sponsored by:

Chronosphere logo

There are a LOT of questions surrounding observability. Heck, is observability even a noun or a verb? To answer some of those burning questions, Chronosphere teamed up with Forrester to discuss what cloud native observability really is and why observability can lead to better business outcomes. Read our latest blog summarizing the discussion.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

The Career, Accomplishments, and Impact of Richard I. Cook: A Life in Many Acts

It’s come to my attention that Dr. Richard Cook recently passed away. His research influences the way we think about complex system failures and how to engineer resilience into these systems. Dr. Cook will be missed but our industry is better for having intersected with his life’s work.

Observability is becoming mission critical, but who watches the watchmen?

It’s not impossible to ensure a highly resiliant observability stack, but it takes careful planning and mindful execution. Glad to see someone covering this topic, I’d like to see more about it.

Prometheus Monitoring. Easy Explained.

Unusual title, but this looks like a handy introduction to Prometheus concepts and setup. Probably a good one to share with colleagues who are less familiar to the space.

Distributed Tracing in Rust

This might be one of the best guides I’ve seen for setting up distributed tracing in your application, from instrumentation through visualization. If you’re a Rust programmer (or a wanna-be Rustacean like me), you should absolutely check this one out.

kOps K8s Control Plane Monitoring with Datadog

If you’re using kOps to provision your Kubernetes control plane, you might have already hit the proverbial wall with Datadog’s native integrations. This guide should help you get things hooked up properly.

A Quick Introduction to Top Metrics & Tools to Track the Kubernetes Observability

An overview of monitoring considerations and tooling suggestions for Kubernetes clusters.

Key metrics for AWS monitoring

Some of the more important metrics respresenting failure modes across a variety of AWS services.

MemLab: An open source framework for finding JavaScript memory leaks

Anyone who’s ever worked on frontend code with custom Javascript knows how easy it is to introduce memory leaks. Facebook has released their memory testing and leak detection framework (MemLab) as open source.

Is your plugin compatible with Grafana? There’s a tool for that!

If you maintain your own Grafana plugin, there’s now a tool to help ensure compatibility with new Grafana versions.



A framework for finding JavaScript memory leaks and analyzing heap snapshots


A tool for helping to understand APIs exported and consumed by NPM packages (or any TypeScript code).

Job Opportunities

Site Reliability Engineer at Leadfeeder (Remote)

Senior Site Reliability Engineer at Sentry (NA Remote)

Site Reliability Engineer at Sentry (NA Remote)

SRE / DevOps Engineer (US Remote)

Backend Engineer - Observability Infrastructure at Spotify (NYC, US)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor