Fun collection of diverse posts this week, with a recurring theme of production debugging and system design deep-dives. Oh, and just one more week until Monitorama… hope to see you there! 🌻🐛💾

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Making sense of Grafana Dashboards

This article was written for anyone who’s installed Grafana, hooked up some exporters, and was left scratching their heads wondering “what now?”. Great job deconstructing some of the parts that most of us take for granted or had to piece a bunch of reference docs together to figure out.

UnknownHostException? CoreDNS fail? A Journey of Troubleshooting a National Platform Incident

I always enjoy a good debugging story, bonus points for uncovering an AWS AppMesh bug (and getting AWS to acknowledge it).

Web Performance Regression Detection (Part 2 of 3)

The second post in Pinterest’s series looking at regression detection, this time focusing on real-time user metrics, alerts and response. Tons of detail and explanations, excellent article.

Grafana Loki query acceleration: How we sped up queries without adding resources

A deeper dive into some of the decision choices and query performance gains as a result of Loki’s bloom filters.

Virtualizing Our Storage Engine

Another look at query performance improvements, this time related to changes within Honeycomb’s internal storage service.

Tracking User Experience with Datadog RUM: A Beginner’s Guide

We use Datadog RUM at $dayjob and it feels like we’re learning something new about it every day. This guide aims to alleviate some of the initial onboarding pain, with future articles intending to go further.

Monitoring DigitalOcean Apps with GitHub Actions

A clever pattern for leveraging GitHub Actions to monitor cloud runtime logs and take remediative action.

Improving Elasticsearch configuration for better resource consumption for Logs use cases

A collection of configuration best practices for more efficient Elasticsearch resource usage during heavy logs ingestion.

Application Security Through the Lens of OpenTelemetry

Looking at the intersection of observability and application security thanks to the emergence of OpenTelemetry.

Tools

openclarity/apiclarity

An API security tool to capture and analyze API traffic, test API endpoints, reconstruct Open API specification, and identify API security risks.

Events

Monitorama PDX 2024 - Last Chance!

Just one more week until this year’s Monitorama event in Portland, OR. So much buzz around this year’s agenda and speakers. I genuinely hope I’ll get to see you at the event… please walk up and say hi! 👋

Monitoring Weekly readers can still save $100 off General Admission tickets with the MWEEKLY2024 discount code. Hope to see you there!

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor