Issue 240

It was a busy week with so many folks at KubeCon NA… hope everyone made it home safe and without the travel bug. The changing seasons and colder weather mean it’s a great time to hang inside and cuddle up with your favorite newsletter. Enjoy! 🚢🤧🍂

This issue is sponsored by:

Chronosphere logo

As cloud native architectures have been more widely adopted, “hero developer” culture, fragmented visibility, and misaligned tools have been unfortunate side effects. This is why Chronosphere has reimagined the developer experience with our latest features. See what we’ve been up to.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Unlocking Metrics: The Art of Accurate Metrics in Multi-Stage Processes

Engineers at Deutsche Telekom share an interesting approach to metrics collection for the lifecycle of a request. Feels like a more accurate representation of a request with the targeted nature of a metric and without the full weight of a trace.

Sofia’s Observability Odyssey: Bad Log Day

The next chapter in Sofia’s journey to observability enlightenment. This one resonates for me, at least in the sense of reducing logging bloat with more deliberate metrics instrumentation and collection.

A rabbit hole in monitoring

I really enjoyed reading this post and appreciate the author’s honesty, but imho this situation could’ve been easily avoided. I would also encourage them to be mindful of their next technology decision because it sounds like history might repeat itself.

o11y theory: instrumentation

Some thoughts on instrumentation, getting buy-in from your team, and an argument for auto-instrumentation with OpenTelemetry.

Getting started with Tetragon on GKE

A look at Tetragon, an observablity tool designed for the security domain. Interesting to see that it goes beyond basic introspection to full-on policy enforcement.

How to write a Postmortem

A framework for writing postmortems, with templates, some solid references, and sample incidents. Even if you already have a solid incident response program in place, you might pick up a few tips.

Firehydrant logo

“Change is the essential process of all existence.” - Spock

It's time for alerting to evolve. When FireHydrant launches Signals this winter, it will be the first time alerting has existed natively in a full-cycle incident management platform. Get a first look at how FireHydrant is architecting Signals for resiliency in the Signals Captain's Log. (SPONSORED)

How to use OpenTelemetry to expose custom Prometheus metrics from nodeJS applications

A concise guide for instrumenting your NodeJS apps with OpenTelemetry.

How We Manage Incidents at Datadog

Speaking of incident response, Datadog is happy to show you how they manage their own outages (with their product, of course). Snark aside, they have some pretty solid tools for IM and this walkthrough demonstrates how they all fit together.

Simplifying Kubernetes Logging with EFK Stack

An exhaustive guide for collecting and managing Kubernetes logs with Elasticsearch, Fluent bit, and Kibana.

Tools

cilium/tetragon/

“Cilium’s new Tetragon component enables powerful real-time, eBPF-based Security Observability and Runtime Enforcement.”

Events

Monitorama PDX 2024 - Call for Participation

The Monitorama conference has opened their Call for Participation for next year’s event in Portland, OR. Submissions are due by February 4, 2024. Hope to see you there!

Job Opportunities

Senior Site Reliability Engineer, Platform Automation at ZScaler (US Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor