Some in-depth articles this week, with an emphasis on production observability and scaling your systems. Great stuff all around. Enjoy! 🚢🐧☕

This issue is sponsored by:

Firehydrant logo

Structure your process with a single source of truth and configurable step-by-step Runbooks. Automate declaration, assembly, and communication to move faster and more uniformly. Improve your systems with insights from incident analytics for true reliability gains. Get started for free or book a demo at

Articles & News on

Observability & Monitoring Community Slack

It’s amazing to see the community continue to grow. We’d love to have you join us and share what you’ve been working on.

From The Community

Use Responsive Observability to Deliver Improved User Satisfaction

This article makes a strong case for the outcomes that are possible when we introduce observability with an eye towards the user experience. Yes, please.

Log names, buckets and scopes

Logging with cloud providers is generally pretty straightforward, but the devil is in the details. This post clears up some of the pecularities with logging in Google Cloud.

Enable Alerts from AWS CloudWatch through Mail

When Slack alerts are coming at you too fast and /dev/null is too drastic a measure, what’s a chill DevOps Engineer to do? All kidding aside, there are probably some times when email is “just right” for alerts, and this guide will help you glue together your AWS services to make it happen.

Hodor: Overload scenarios and the evolution of their detection and handling

A follow-up to LinkedIn’s earlier story introducing HODOR, their overload detection and remediation framework. In this article we learn more about how HODOR is used along with some new detectors they’ve added to the suite. If you’re at all interested in designing monitoring systems, you’re going to love this one.

OpenTelemetry — Mastering the basic main concepts

OpenTelemetry is a huge step forward in terms of standardizing the instrumentation and collection of observability data. But it can also feel like chewing an elephant to get it adopted and used effectively. This post attempts to cut through the noise and simplify the concepts of OpenTelemetry to help you get started on your journey.

Husky: Exactly-Once Ingestion and Multi-Tenancy at Scale

I love reading how other engineers design their metrics systems to perform. As one of the largest observability vendors, Datadog has had to overcome significant scaling challenges as they’ve grown. This deep-dive offers an insightful look on how they designed their newest event store.

MetricFire logo

🏡 Your data needs a home

In order to use your data effectively, you need to send it to an endpoint and visualize it to get the metrics you need. MetricFire provides that endpoint so you can save time and money in development work. It also gives you the visibility and custom dashboards you need. Learn how you can use MetricFire here. (SPONSORED)

Kubernetes Logging with Grafana Loki & Promtail in under 10 minutes

Your boss asks you to stand up a Kubernetes cluster and get it monitored in under 15 minutes. Ok, you’ve got K8s going but now you’re left with 10 minutes… what to do?! Fortunately, this guide will get you there with seconds to spare. Get moving!

How to extract label values from Prometheus metrics in Grafana

Prometheus labels are awesome, often used to capture metadata or even to represent the values themselves (e.g. software versions). This guide walks you through the challenging task of extracting them for practical uses in Grafana.

Incident travel time

Although short on actionable takeaways, this post on incidents is a good reminder that the quality of our response is almost as important as the remediation itself.


Monitorama PDX 2023 - June 26-28 (Portland, OR)

We’re really looking forward to this event which marks the ten-year anniversary of Monitorama 2013 originally held in Boston, MA. Proposals are currently being reviewed and if they’re anything to go by, this should be an awesome lineup of talks. Hope to see you there!

Job Opportunities

Senior SRE - Big Data at Hive Collective (US Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor