Issue 260
A little bit of something for everyone, from logging and OpenTelemetry to NGINX geolocation, and some motivational posts on alerting and incidents. Enjoy! ☕🔥🔔
This issue is sponsored by:
For organizations looking to succeed in a cloud native landscape, DevOps practices are merely the starting point. In this analyst guide from Intellyx and Google Cloud, explore how organizations can navigate the paradoxes of cloud native platform engineering and how cloud native impacts developer productivity and experience.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Minimizing on-call burnout through alerts observability
A look at how Cloudflare engineers maintain a healthy environment for on-call engineers through observability and analysis of their monitoring systems. I really enjoy seeing “behind the curtain” at other companies that have to deal with alerting noise at scale.
Transitioning to OpenTelemetry
One of the best posts I’ve read on adopting OpenTelemetry. I especially appreciate the honest comparisons between logging and tracing.
Klaviyo Incident Management: Interview with Laura Stone
An inspiring look at how one company planned for, built, and maintains a healthy culture of incident management and learning.
Grafana Labs recently announced their new Alloy collector, deprecating the Grafana Agent (entering Long-Term Support immediately), and effectively a graduation of the Grafana Agent Flow experiment. They’ve published a separate post which looks to dispel some of the confusion around Yet Another Agent ™.
Building the Eye of Sauron: Enriching Nginx logs with GeoIP
There are different methods for collecting and visualizing geolocation data for your services, but this approach for NGINX is unique in that it doesn’t require recompiling it with the geoip module. Looking forward to the next post in the series where they hook it up to Grafana.
Raygun error monitoring processes over 90 million crash reports per day. This blog steps through their experiment switching to a possibility tree for more effective string parsing, speeding up the parsing of data flowing through the Raygun processing pipeline by 45x. (SPONSORED)
A quick collection of logging best practices and tips.
A Practical Guide to Monitoring & Observability of IoT Devices
Monitoring a fleet of IoT devices is not for the faint of heart. This author walks through some common challenges and presents a solid list of best practices and considerations to get you started.
Beyond Silos — Achieving Observability Excellence in Multi-Team Serverless Environments
If you run serverless in an organization with tight controls, this article was written for you. There’s no silver bullet here, but it should help inform how you coordinate observability across your teams.
Simplify Your Logs Management with Grafana’s New Explore Logs App
I shouldn’t love this as much as I do, but as someone who used to deep-dive through Graphite’s metrics navigation tree, this sort of exploration really speaks to me. Great to see Grafana Logs add support for Loki in Explore.
Push Script Outputs to Prometheus and Grafana Using Pushgateway
Super simple example for pushing custom metrics to your Prometheus from the CLI (or a script).
Events
Monitorama PDX 2024 - Agenda is Live!
Monitorama organizers released the upcoming speakers list and agenda for this year’s upcoming PDX 2024. Exciting to see so many unique topics, I can feel the FOMO rising already. 😅
Monitoring Weekly readers can save $100 off General Admission tickets with the MWEEKLY2024 discount code. Hope to see you there!
Job Opportunities
Staff Site Reliability Engineer, Observability at Fastly (US Remote)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor