Issue 235

Hope you enjoyed last week’s “Best Of” quarterly review. Plenty of variety in today’s issue, with stories covering everything from debugging, tracing, open source, and alerting. Enjoy! 🍩☕📈

This issue is sponsored by:

Firehydrant logo

Alerting is evolving. Signals is coming soon.

This winter from incident management platform FireHydrant: alerting and incident response in one ring-to-retro tool for the first time. Sign up for the early access waitlist and be the first to experience the power of alerting + incident response in one platform — at last.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Lessons from debugging a tricky direct memory leak

Debugging OOMs can be a massively frustrating experience, but it’s such a great learning opportunity for understanding your systems better. Great post from a Pinterest engineer tasked with a particularly icky one.

Surfacing performance issues with effective visualization of profiling data

I’ve always been fascinated with the application of different visualization types in monitoring and observability work. This post from a engineer working in the profiling space offers context around the strengths of each viz type.

Managing Prometheus alerts in Kubernetes at scale using GitOps

Organizing and managing alerting rules can be a major hassle as your teams and architecture grows. This post demonstrates a pattern for decentralizing ownership of your alerts using GitOps.

Sampling Strategies in Distributed Tracing — A Comprehensive Guide

Sampling is one of those things that nobody wants to do, but is generally inevitable if you’re getting real value from – and adoption with – your instrumentation. This post does an excellent job covering many of the foundational concepts and pitfalls to avoid along the way.

Building a Perl Script for High Availability Monitoring in Centreon

Although it doesn’t own much mindshare on this side of the Atlantic anymore, Perl is still a popular language for infrastructure and monitoring tooling elsewhere. Good to see that it’s still got plenty of life for prototyping solutions and delivering quick value.

Open Sourcing iris-message-processor

I miss the days of companies building in-house solutions and open sourcing them to “pay it forward”. Now it feels like everyone outsources everything (not just observability tooling), which… I get it, but it’s not nearly as fun imho. Props to LinkedIn engineering for sharing these tools with the community.

Google Cloud Synthetic Monitoring Tutorial

A tutorial for getting started with Google Cloud’s new synthetic monitors and running your own uptime checks.

Monitoring — Basics Of Prometheus, Grafana

A handy primer for anyone new to Grafana, Prometheus, and Alertmanager.

Introducing the Prometheus Java client 1.0.0

The Java client for Prometheus recently hit its milestone 1.0.0 release, including some great new features.

Tools

linkedin/iris

“Iris is a highly configurable and flexible service for paging and messaging”

linkedin/oncall

“Oncall is a calendar tool designed for scheduling and managing on-call shifts”

prometheus/client_java

“Prometheus instrumentation library for JVM applications”

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor