Issue 177

Hope you’re all staying safe in this near-global heat wave. Why not grab a cool drink and stay indoors with this week’s newsletter? Plenty of great articles covering monitoring tools and software this week, including some interesting stories from Netflix, Uber, and Intercom. Enjoy! 🌞🧊👋

This issue is sponsored by:

Drata logo

Say goodbye to manual evidence collection and hello to automated compliance. Drata, G2’s highest rate cloud compliance software, offers 60+ integrations that seamlessly connect with your various tech stacks used to manage compliance across your organization. Monitoring Weekly readers get 10% off Drata here.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

The Mathematics Behind Monitoring

We don’t talk nearly as much about the math of our metrics as we did in years past. Here’s a brief introduction to some of the more important metric types found in Prometheus, and when to use each.

Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem

Some fascinating insights into how Netflix leverages data to predict OOMs in their client applications.

Monitoring app performance. OpenTelemetry & Jaeger

This engineer shares their experience adopting OpenTelemetry and Jaeger through this concise tutorial. Good stuff.

Vertical CPU Scaling: Reduce Cost of Capacity and Increase Reliability

How Uber leverages their utilization metrics to automate resource allocation across their fleet. Some insightful examples that should be applicable elsewhere.

Monitoring Kubernetes — With Prometheus And Nagstamon

I haven’t personally heard of this status monitor tool before, but I can see why some folks might use it. I do like that you can trigger certain actions right from the desktop.

Introduction to OpenTelemetry & Distributed Tracing

A multi-part continuing series looking at distributed tracing, why to consider OpenTelemetry, instrumenting a sample project, and using Elasticsearch for persistent trace storage (with an upcoming article on custom identifiers).

Building a resilient system: Our journey to observability at Intercom

How Intercom engineers re-evaluated their observability tooling, which compromises they had to make, and how they achieved increased adoption across the company.

Chronosphere logo

At Chronosphere, we’re constantly thinking about what’s next in observability. That’s why we want to provide a new outlet to engage with the community on the latest developments, solutions, and philosophies in the observability space. Fancy yourself a cloud native enthusiast like us? Check out our new YouTube series covering all things observability here. (SPONSORED)

Release of Prometheus 2.37 - Long-Term Support

The Prometheus project recently shipped version 2.37, their first “LTS” release with a six-month support lifecycle (in contrast to their usual six-_week_ cycle).

How we improved Grafana Mimir query performance by up to 10x

Grafana Labs explains their new query sharding functionality for Mimir. If I’m being honest, this feels like a leaky abstraction that I shouldn’t have to think about, but at least they’re trying to educate users on how to leverage it.

How to Monitor PHP-FPM with Prometheus

A quick tutorial for getting up and running with the PHP-FPM exporter for Prometheus.

Pg-agent – a Postgres exporter for Prometheus focusing on query performance

The folks at Coroot have developed pg-agent, an alternate PostgreSQL exporter for Prometheus they claim is superior for monitoring query performance statistics.

Tools

coroot/coroot-pg-agent

“A Prometheus exporter for Postgres focusing on query performance statistics”

henriwahl/nagstamon

“Nagstamon is a status monitor for the desktop. It connects to multiple Nagios, Icinga, Opsview, Centreon, Op5 Monitor/Ninja, Checkmk Multisite, Thruk and monitos monitoring servers.”

Job Opportunities

Software Engineer - Reliability at Figma (US Remote)

Senior Software Engineer at Six Nines (US Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor