Plenty of diverse topics this week, from learning how to be an Incident Commander to CI monitoring to pluggable anomaly detection. Fun stuff – enjoy! 🚵‍♀️📈🍩

This issue is sponsored by:

Firehydrant logo

The new principles of alerting

In this blog post, incident management platform FireHydrant argues that alerting tools have stopped innovating and lays out four principles on which any alerting tool should be built: cost-efficiency, service catalog empowerment, easier scheduling and substitutions, and clear distinctions between incidents and alerts.

Articles & News on

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Solving Metrics at scale with VictoriaMetrics

One engineer’s argument for VictoriaMetrics over Prometheus. I will note that there’s a lot of subtlety to comparing these tools that isn’t covered in this article, but it still raises some important considerations whether you’re looking at these two specific metrics systems or others in the ecosystem.

Managing Alerts in OpenShift with AlertManager CLI

If you’re an OpenShift admin, you should definitely read this one. Excellent level of detail with explanations and context. 👏👏👏

A guide to Incident Command

Having the right Incident Commander can make all the difference in how your team comes together to remediate an outage. Many folks are terrified at the responsibility of the role, but this post explains in detail how to be an effective IC and lead your team through turmoil.

Bring Your Own Algorithm to Anomaly Detection

Pluggable algorithms isn’t really a new concept, but I haven’t seen many (any?) software projects that make it easily self-serviceable. Good example from Pinterest engineers on how they’re democraticizing their anomaly detection platform.

Grafana Image Renderer v3.8.3 released - CVE-2023-4863

Grafana Labs has released a new version of the Grafana Image Renderer to address a high severity CVE. Please go update your affected systems ASAP.

How to Monitor Jenkins Using Prometheus, Node exporter and Grafana

I stumbled across some CI monitoring posts recently and thought it would be fun to include them together since each offers something slightly different. This first one is a gritty guide through setting up Prometheus and Grafana to monitor Jenkins through the Node exporter and its own Prometheus metrics plugin.

Axiom logo

Axiom is the best place to send all your OTel traces and makes it easy to collect and analyze 100% of your event data from staging, to production, to... your trips to the moon. (SPONSORED)

Monitor ArgoCD Application health with Prometheus

This one assumes that you’re already running ArgoCD in your Kubernetes cluster and dives right into your Prometheus configuration and metrics. Bonus points to the author for explaining some of the important metrics for ArgoCD and how to alert on them effectively.

Observability Tools for Development Teams: A Practical Guide

A superset of all possible observability topics that might be relevant to developers. There’s a lot in here, but the author does a pretty good job going just deep enough on each topic to be informative without overwhelming the reader.

Role of Observability and Monitoring in Modern Software

A look at why we should bother with monitoring in the first place, particularly as it pertains to scaling and maintaining complex architectures. Share this one with the executive who’s been penny-pinching your observability budget.

How to easily retrieve values from a range in Grafana using a stat panel

Some quick tips for using stat panels in your Grafana dashboards.

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor