Loving the variety this week, with some standout posts on alerting, metrics, instrumentation, and an oncall handoff bot. Enjoy! 📈🤖🦜

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Go Microservices: Monitoring, Logging, Debugging, Tracing, and Profiling

An excellent primer on Observability from the perspective of a Go programmer. Even if your jam isn’t Golang, the examples and context are approachable for anyone who codes.

Keep your dashboard clean: Acknowledgement is not a solution!

Tips on crafting sustainable alerts and ways to avoid alert fatigue. Sage advice, worth a read.

Build an Alert System for Monitoring AWS ECS Task Termination

You might not need this today, but I wager this would be a good pattern to add to your cloud-native monitoring toolkit.

Intro to AWS Monitoring & Messages Processing Tool

This is a unique post that feels pertinent to this audience. Obviously not all of these services are monitoring-specific, but I like that the author covers their relevance in the context of monitoring infrastructure.

Concept Article: Circuit breaker pattern monitoring

Interesting pattern and not one I recall seeing in production. Conversely, and maybe I’m just being old here, but what happened to simply designing HA systems in a way that didn’t require breaking the “circuit” and managing the open state? Maybe someone can pop into our community Slack and clue me in. 😅

Simplifying Spring Observability with OpenTelemetry Auto-Instrumentation and Java Agent

Quick review of auto-instrumentation for Spring Boot apps with OpenTelemetry and Java Agents.

Refining Incident Management with Metrics and Looking to the Future at Dyninno Group

How Dyninno incorporates service level indicators and a more thorough feedback cycle to mature their incident response processes.

Creating An On-call Handoff Bot

A fun read by a Disney engineer about a custom bot written to ensure that open issues don’t get dropped on the floor between oncall rotations. Sadly, there doesn’t appear to be any links to an OSS project, but the author provides more than enough details to write your own (and that’s half the fun, right?).

Key Metrics for Monitoring Etcd

A super detailed breakdown of the metrics you should care about to monitor Etcd effectively (even if you’re not using Datadog).

Karpenter: its monitoring, and Grafana dashboard for Kubernetes WorkerNodes

A detailed walkthrough of setting up a VictoriaMetrics backend, Grafana frontend, the Karpenter and Kubernetes metrics we should care about, and crafting a dashboard to monitor it all effectively.

Events

Monitorama PDX 2024 - $100 off General Admission

Early Bird tickets are gone, but Monitoring Weekly readers can still save $100 off General Admission tickets with the MWEEKLY2024 discount code. This year’s agenda is shaping up to be a banger, don’t miss out!

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor