Some deeply technical and fun stories from production this week. Hope you enjoy reading them as much as I did. If you’re a Go developer, make sure to check out the NilAway article too! ⛄🦃🏳‍🌈

This issue is sponsored by:

Axiom logo

Stop sampling & capture everything needed for o11y, security, analytics, and more. Axiom efficiently ingests, stores, and queries 100% of your app and infra telemetry with no sampling or cold storage required. Within seconds, know exactly what happened 3 minutes, 3 months, or 3 years ago.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

OpenTelemetry parameter that might ruin your flexibility

An excellent article on an underserved topic… the importance of choosing the right metrics temporarlity for your OpenTelemetry metrics aggregation and its impact on the long-term portability of your data.

Insights from building a scalable distributed tracing platform for adidas

Lessons learned adopting distributed tracing (and its effects on the rest of their observability stack) inside a Platform team at Adidas Group.

VictoriaMetrics: pushing metrics without Prometheus Pushgateway

Long-time readers know I’m a stubborn fan of the push (vs pull) model for metrics collection. This example demonstrates VictoriaMetrics’ native support for push, eliminating a potential extra hop in your data pipeline.

Improving Efficiency Of Goku Time Series Database at Pinterest

The start of a new series of posts from Pinterest engineering revisiting their in-house time-series database, looking back at some of the challenges they faced since its original design and how they’ve adapted it to meet their growing needs. TSDB geeks should love this one.

NilAway: Practical Nil Panic Detection for Go

We don’t typically cover programming tools like linters here, but this post from Uber Engineering provides a fascinating look into their approach to nil panic detection and mitigation.

How Fixing my Typo Improved Cribl Search Query Performance by 20x

Writing performant log queries is an underappreciated skill imho, and this story from a Cribl engineer proves that even the professionals can struggle to cobble together the right bits at times.

Kubernetes: Liveness and Readiness Probes — Best practices

A handy primer on Kubernetes probes and how to make the best use of their respective states.

OCI Cross-tenancy log management

Some policy considerations and examples to be aware of if you have to support cross-tenancy log collection in Oracle Cloud Infrastructure (OCI).

Tools

uber-go/nilaway

NilAway is a static analysis tool that seeks to help developers avoid nil panics in production by catching them at compile time rather than runtime.

Events

Monitorama PDX 2024 - Early Bird Tickets

Early Bird tickets are running out soon, make sure to grab yours while you can before prices go up for General Admission seating. And don’t forget to submit your CFP proposal before the deadline!

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor