I hope you all enjoyed our quarterly “Best of” review last week. Today’s issue is a fun collection of hands-on articles and guides, heavy on microservices, OpenTelemetry, and Prometheus topics. Enjoy!
This issue is sponsored by:
Regardless of where you are on your incident management maturity journey, there’s a right next step you can take. Learn about three areas of focus — roles, services, and retros — why they’re important, and how to improve at any level in "3 ways to improve your incident management program in 2023."
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Prioritizing Development Efforts with SLOs in Microservices
An excellent overview of service levels and related concepts, with actually helpful query examples. This article is on the longer side but well worth your time.
Scaling kubernetes monitoring with prometheus federation
If you’re running disparate Prometheus servers, you’re probably making life difficult on your users. This guide will explain the basics of Prometheus Federation and get you started with a basic configuration.
Day 2 Observability - calls to other services
Having to call out to external providers can make it challenging to maintain a high degree of observability. This post introduces a couple patterns for collecting OpenTelemetry spans in these scenarios.
System Observability in a nutshell
A vendor-agnostic look at what Observability really means in a systems context. Useful for SRE folks and anyone else who cares for production services but might not actually be developing the services themselves.
Reading about how others think about alerts (and the failures of bad alerts) is something I’ll never get tired of, and unfortunately, something I think we’ll never really master as a discipline. Still, it’s important to share our learnings and continue evolving our practice.
How to find unused Prometheus metrics using mimirtool
I’m a little sad that this tool needs to exist (I wrote something similar for Graphite a decade ago?) but it does address a valid need. Worth adding to your sack of monitoring tools.
OpenTelemetry and Observability: Implementing Effective Distributed Tracing
Just enough guide to get you up and running with OpenTelemetry and Zipkin before your coffee cools down.
5 reason why you should not reopen your IT incidents
I’m surprised to hear there are teams out there that will reopen incidents. If this is something you experience regularly, please read this article and then… just don’t.
Not your grandfather’s logs — A Java library’s new approach to observability
We took our first look at the Micrometer project a few months ago, and it’s back again in the guise of someone equally surprised to discover it. I agree with the author here, the project creators did a nice job of designing a logging library that abstracts out some of the pillar-ness of modern Observability approaches.
OpenTelemetry: Sending Traces From Ingress-Nginx to Multi-Tenant Grafana Tempo
I love this use of the Ingress-NGINX controller to tag traces for Grafana Tempo in a multi-tenant Kubernetes environment.
Unlocking Real-time Data Power with Apache Kafka: Exploring Its Top Use Cases
We don’t cover Kafka much here, in spite of its importance in many Observability pipelines. If you’re not already familiar with it, this article covers some of its more common and appropriate uses.
“An application observability facade for the most popular observability tools. Think SLF4J, but for observability.”
Monitorama has announced their full agenda for this year’s event. Looks like an awesome collection of topics and speakers. Hope to see you there!
Software Engineer at Nava (US Remote)
Sr Software Engineer at Nava (US Remote)
Senior Cloud Infrastructure Engineer at Quick Quack Car Wash (US Remote)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor