Awesome collection of posts this week with an emphasis on AI/LLM monitoring, adopting OpenTelemetry in Go and Rust, and some heavy feels around the state of observability and learning from our mistakes. Enjoy! 😇🦀🧠

This issue is sponsored by:

Blacksmith logo

Run GitHub Actions up to 2x faster at half the cost

Blacksmith runs your GitHub Actions substantially faster by running them on modern gaming CPUs. Integrating Blacksmith is a one-line code change. 100+ companies like GitBook, Superblocks, and Slope use Blacksmith to help developers merge code faster.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Leveraging LLM-as-a-Judge for Observability

Very interesting review of observability concerns as they pertain to LLM performance and accuracy within the context of the model itself.

Which Metrics Should You Monitor for Large Language Model Performance?

By contrast to the previous article, this one is a very no-nonsense look at metrics relevant to LLM performance (e.g. hardware, system throughput).

Advanced Monitoring with AI and Prometheus: Detecting and Mitigating Memory Leaks

A very approachable demonstration of how basic heuristic monitors can be leveled up with machine learning and a tiny bit of code.

Is It Time To Version Observability? (Signs Point To Yes)

Some real talk from Charity Majors on the successes (and failures) of “Observability 1.0”. There’s plenty to chew on here, and while I largely agree with her points, I still sense some bias towards particular types of developers and systems. Regardless, an excellent post.

Go and OpenTelemetry: A real-world implementation on open-source software

A unique look at the effort and techniques involved in updating an open source project to leverage OpenTelemetry tracing.

Simple OpenTelemetry logger in Rust

A nice complement to the previous post, this one takes a similar look at adopting OpenTelemetry, but for a Rust app.

Embrace logo

What your SLOs aren’t telling you about mobile

Join Embrace September 26th at 1pm ET to learn how to craft and monitor SLOs that are specialized for mobile and connect directly to user experiences. Level up your observability practice with mobile app performance insights. (SPONSORED)



Addressing Tool Sprawl Without Falling Prey to Vendor Lock-In

The siren song of consolidation and reducing tool sprawl can be alluring, but it can also lead to vendor lock-in and a loss of flexibility without the right planning upfront.

Sampling Strategies for Monitoring (Part 1)

Comparing the tradeoffs between static and dynamic sampling of monitoring data.

Incident Management for New Engineers

I have so much empathy for this engineer and their vulnerability in learning how to fail blamelessly. This is how we learn effectively, both individually and as a team. Thank you to the author for sharing their story.

NMS Migration Made Easy: Get Stakeholders Aligned

Some important lessons here for anyone trying to land a new observability initiative. It doesn’t matter how good your technology is if the users and stakeholders aren’t invested in its success.

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor