Issue 259

This week’s issue is packed with variety. Loving the behind-the-scenes coverage of in-house tooling and another hands-on post from Brendan Gregg. Enjoy! 🌈📈📷

This issue is sponsored by:

Raygun logo

In this postmortem, the Raygun team details an incident where a routine infrastructure cleanup inadvertently removed the load balancers associated with their API nodes, leading to a significant outage. A handy reminder of the importance of monitoring and understanding IT infrastructure dependencies to improve system resilience.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

KubeCon Europe 2024: Cloud Native In La Ville-Lumière

Recap of a talk from the recent Observability Day at KubeCon, diving into the challenges with extracting metrics and logs from trace spans at scale, and some of the approaches used by the author to date.

Distributed tracing for asynchronous workflows using Opentelemetry

Hiver engineers demonstrate the use of distributed tracing for identifying bottlenecks in their asynchronous processing pipeline.

CI/CD observability: Extracting DORA metrics from a CD pipeline

The folks at Grafana Labs walk through a pattern for collecting DORA metrics for your continuous deployment pipeline. It skews pretty heavy on Grafana Cloud, but this could all be done with most popular Observability and CI/CD tooling.

Observability Maturity Model for AWS

An interesting model for evaluating your progress along the Observability journey, bucketing the various pillars and technology domains into a matrix of capabilities and maturity.

Linux Crisis Tools

Another excellent post from Brendan Gregg, this time reviewing his go-to list of “crisis tools” for troubleshooting under pressure. And the imaginary scenario he describes hits a little close to home. 😜

GrafLI - An out-of-the-box Azure monitoring and visualization platform

LinkedIn engineers share details about their cloud-native Azure visualization tool GrafLI, what makes it unique, and which Azure features they used to build it.

Monitoring Kubernetes network traffic by using eBPF

How the SkyWalking project leverages eBPF for network traffic monitoring in Kubernetes clusters. There are a couple limitations discussed near the end, but it still looks like a solid implementation.

Leveraging Fluent Bit in Large-Scale Machine Learning Model Pipeline

Although this post was written to demonstrate the strengths of Fluent Bit for ML workloads, it falls a bit short of that imho (at least in terms of comparing its advantages over competing logging systems). Regardless, it offers some helpful details for anyone considering Fluent Bit for similar use.

Terraform: creating a module for collecting AWS ALB logs in Grafana Loki

A super detailed guide for creating a Terraform module to automate log collection from AWS load balancers. Should be a useful pattern for adding other tooling or other service logs later on.

Events

Monitorama PDX 2024 - Agenda is Live!

Monitorama organizers released the upcoming speakers list and agenda for this year’s upcoming PDX 2024. Exciting to see so many unique topics, I can feel the FOMO rising already. 😅

Monitoring Weekly readers can save $100 off General Admission tickets with the MWEEKLY2024 discount code. Hope to see you there!

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor