Issue 236

Service discovery and alerting is definitely the theme for today’s newsletter. Also make sure to check out the Grafanalib article for managing your Grafana dashboards. Enjoy! 📈🍂☕

This issue is sponsored by:

Firehydrant logo

The tl;dr on DORA for incident management

The 2023 DORA report has two conclusions with big impacts on incident management: incremental steps matter, and good culture contributes to performance. Dig into both topics and explore ideas for making incremental improvements of your own in this ebook from FireHydrant.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Alert and Alert Manager in Prometheus

In my experience, Alertmanager tends to be one of those tools you learn through shadowing and tribal knowledge. This post cuts through a lot of that and goes beyond basic setup tips to demystify some of the less obvious aspects of using it in real scenarios.

Sofia’s Journey into Observability: From Stage Fright to Kubernetes Insight

The next chapter in Sofia’s journey takes us into some dark places, seeking truths around Kubernetes service discovery.

How Grafanalib Helps You Manage Dashboards at Scale

How to manage the sprawl and maintain discoverability of Grafana dashboards and data is a common theme for most organizations. This post introduces a pattern with the Grafanalib library that sounds like a good option for many.

A Guide to Service Discovery with Prometheus Operator

Speaking of Kubernetes service discovery, here’s a less fictional take on our choices available.

Profiling: Flame Chart vs. Flame Graph

I honestly never gave much thought to the differences between flame charts and graphs before reading this article. A pretty handy guide for understanding when to use each visualization.

Not a Single Trace

Sound advice for anyone working on an unfamiliar-to-them system or who’s otherwise susceptible to overreacting. 😅

Chronosphere logo

Using Kubernetes to power your infrastructure and services? Learn why Prometheus is the natural choice for monitoring your environment and ensuring your Kubernetes cluster manager and the services running on top of it are always healthy and behaving as expected in this recent article from Chronosphere. (SPONSORED)

Alert Fatigue in DevOps

Your annual reminder to practice sustainable, actionable alerting.

Gamifying Incident Response: Level Up Your Team’s Performance

I’m not comfortable with the notion of leveraging competition to “revolutionize” something as critical (and fragile, in many cases) as incident response. Curious for your thoughts… have you ever seen this approach work?

Grafana security release for CVE-2023-4822

Grafana Labs has released a new version of Grafana to address a medium severity CVE. You’ll want to upgrade if you’re using RBAC with cross-organizational roles.

Tools

weaveworks/grafanalib

“grafanalib lets you generate Grafana dashboards from simple Python scripts”

Events

OSMC 2023 - The Program is online!

OSMC has announced their agenda for their upcoming conference taking place in Nuremberg, Germany next month, 7-9 November 2023.

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor