Plenty of diverse topics this week, from learning how to be an Incident Commander to CI monitoring to pluggable anomaly detection. Fun stuff – enjoy! 🚵♀️📈🍩
This issue is sponsored by:
The new principles of alerting
In this blog post, incident management platform FireHydrant argues that alerting tools have stopped innovating and lays out four principles on which any alerting tool should be built: cost-efficiency, service catalog empowerment, easier scheduling and substitutions, and clear distinctions between incidents and alerts.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
One engineer’s argument for VictoriaMetrics over Prometheus. I will note that there’s a lot of subtlety to comparing these tools that isn’t covered in this article, but it still raises some important considerations whether you’re looking at these two specific metrics systems or others in the ecosystem.
If you’re an OpenShift admin, you should definitely read this one. Excellent level of detail with explanations and context. 👏👏👏
Having the right Incident Commander can make all the difference in how your team comes together to remediate an outage. Many folks are terrified at the responsibility of the role, but this post explains in detail how to be an effective IC and lead your team through turmoil.
Pluggable algorithms isn’t really a new concept, but I haven’t seen many (any?) software projects that make it easily self-serviceable. Good example from Pinterest engineers on how they’re democraticizing their anomaly detection platform.
Grafana Labs has released a new version of the Grafana Image Renderer to address a high severity CVE. Please go update your affected systems ASAP.
I stumbled across some CI monitoring posts recently and thought it would be fun to include them together since each offers something slightly different. This first one is a gritty guide through setting up Prometheus and Grafana to monitor Jenkins through the Node exporter and its own Prometheus metrics plugin.
Axiom is the best place to send all your OTel traces and makes it easy to collect and analyze 100% of your event data from staging, to production, to... your trips to the moon. (SPONSORED)
This one assumes that you’re already running ArgoCD in your Kubernetes cluster and dives right into your Prometheus configuration and metrics. Bonus points to the author for explaining some of the important metrics for ArgoCD and how to alert on them effectively.
A superset of all possible observability topics that might be relevant to developers. There’s a lot in here, but the author does a pretty good job going just deep enough on each topic to be informative without overwhelming the reader.
A look at why we should bother with monitoring in the first place, particularly as it pertains to scaling and maintaining complex architectures. Share this one with the executive who’s been penny-pinching your observability budget.
Some quick tips for using stat panels in your Grafana dashboards.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor