A great week for monitoring articles, with some unexpected takes and new-to-me tools. I especially enjoyed the post about detecting silent data corruptions at Facebook. 🧠💥

This issue is sponsored by:

Chronosphere logo

71% of organizations feel that their observability data is growing at a concerning rate

Chronosphere is the only observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. Find out the top observability concerns in 2022 in this blog.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Detecting silent errors in the wild: Combining two novel approaches to quickly detect silent data corruptions at scale

A fascinating post from Facebook on using opportunistic (out-of-production) and ripple (in-production) testing to detect silent data corruptions in the wild.

Making the Case for Observability (to your boss)

Most of us can take monitoring systems for granted, but others aren’t always so lucky. This article should help you build a strong case for your own observability resources.

Monitoring and Observability

A thorough write-up on monitoring, observability, why you need them, and how to leverage both for effective troubleshooting.

Who watches the watchers?

How to build a simple but effective dead man’s switch for monitoring your observability stack.

Monitoring Production EKS cluster using Prometheus & Grafana

Deploying your Prometheus and Graphite stack with a Helm chart in EKS with an eye towards minimal drift over time.

On-demand deployment environments with K8s at SIG

Kubernetes can be a fantastic tool for a wide variety of use cases, but how do you monitor on-demand environments in a lightweight and repeatable fashion? SIG has a pattern in use for their developers that seems to work well.

Thoughts on Observability

I don’t typically link articles that feel “wrong” to me, but this is still a good post from a Shopify engineer thinking about observable systems. I would argue that we need both internal observability (e.g. logs, traces, instrumented metrics) and external performance metrics. What do you think?

Cribl logo

Data’s growing at over 20% YoY, but most budgets are not.

Cribl is advocating for an open ecosystem for data with the introduction of an Observability Lake, an open, vendor-neutral place to cheaply store data in open formats. Add this to an observability pipeline to help route logs, metrics, and traces regardless of source, and you can avoid vendor lock-in and take control of all your observability data. Learn more about it from CEO Clint Sharp here. (SPONSORED)

Taking care of your loved ones with Grafana and other open source solutions

More proof that open source software is eating the world. How one developer uses Grafana with smart home sensors to monitor his aging father’s health and activity. Privacy concerns notwithstanding, it’s great to see humans directly benefiting from these tools.

Building a Simple, Pure-Rust, Async Apache Kafka Client

If you use Kafka in your observability stack, you might be interested in this simple client written in Rust.

Synthetic Monitoring Tools

A totally unbiased take on the importance of synthetic monitoring in your observability toolkit. All kidding aside, this is a good primer on the different monitoring approaches as you work up the OSI model, and why synthetic monitoring might be suitable for your application.

How to Detect Memory Leaks in Java: Common Causes & Best Tools to Avoid Them

Good advice for avoiding Java memory leaks and finding the ones already lurking in your code.



A minimal Rust client for Apache Kafka


K9s - Kubernetes CLI To Manage Your Clusters In Style!


Call for Participation - Monitorama PDX 2022

Monitorama is returning to Portland, OR this summer. The organizers have recently opened up their CFP for a limited number of speaking slots. Deadline for submissions is March 31, 2022.

Job Opportunities

Staff Site Reliability Engineer at Ada (NA Remote)

Senior Infrastructure Engineer at strongDM (Remote)

Platform Engineer at Informed.IQ (US Remote)

Staff Engineer, Software Architecture at Stripe (NA Remote)

Software Engineer, Software Architecture at Stripe (NA Remote)

Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor