I’ve set you up as a subscriber and you’ll receive the next issue this coming Sunday.

In the meantime, as a special thanks for subscribing, I’d like to share some of the best content with you.

Just getting started? Start here!

How to Monitor the SRE Golden Signals

We’ve talked a lot here at Monitoring Weekly about the use of Google’s “Golden Signals” (errors, latency, saturation, traffic). This post is a multi-part series that dives deep on both understanding what the signals mean in different scenarios as well as how to collect them for different parts of your infrastructure. Check out the rest of the series too:

Best Practices for Observability

Not sure where to start on observability? Charity Majors has got you covered with some really helpful tips and best practices.

Monitoring in the time of Cloud Native - Cindy Sridharan

A monster post, chock-full of everything you ever wanted to know about monitoring a cloud-native infrastructure. Grab a cup of coffee/tea, because you’re in for a treat.

Monitoring and Observability with USE and RED

I really love mental models. The USE Method and the Four Golden Signals from the SRE book are two of my favorites for performance analysis and monitoring. I hadn’t seen RED before, but it looks like a really useful mental model. Regardless, the author makes the observation that the two models are very much complementary, as they talk about related-but-not-the-same metrics.

Hierarchical Observability with RED

While the RED method works pretty well, it has a major shortcoming: diagnosing whether the issue is with one service or a dependency. This article proposes a solution to that while also making the case for standardizing on RED metrics for every service your organization runs.

Monitoring, Analytics, Diagnostics, Observability, and Root Cause Analysis

A much-needed effort at defining terminology and breaking down the overloaded “monitoring” term. The author goes into more detail, but I’ll include the TLDR here because it’s so good:

“Monitoring is the process of observing systems and testing whether they function correctly. Analytics is the process of turning data (usually behavioral data) into insights. Observability is the property of a system that supports analytics. Diagnostics is the process of determining what’s wrong with a system, and also relies on observability. Root cause analysis is corporate mumbo jumbo.”” - Baron Schwartz

Performance metrics. What’s this all about?

Want more about browser performance metrics? Here you go. This article goes into more depth on the current generation of the metrics available to us, what they mean, and how they’re best used.

Looking at Disk Utilization and Saturation

Metrics are great and all…assuming you know what they actually tell you (or don’t tell you). The author takes a hard look at disk utilization and saturation metrics. Skip to the end for the immediate takeaway (spoiler: don’t rely on util%)

It’s Never Obvious: About Percentiles

Being in the midst of a stats bender myself, this article on the math behind percentiles is timely and relevant. If you’re interested in understanding percentiles and using them effectively, this is a really helpful article.

Linux Load Averages: Solving the Mystery

Everything you wanted to know but didn’t think to ask about load averages. I’ve always been taught that load average is CPU load average, but apparently, that’s not quite true for Linux (but is for other *NIX operating systems). There’s some really great stuff in this article.