A super fun week chock full of production stories and reflections. I particularly love the posts from DoorDash and Slack engineers. Enjoy! 🌈🚪💨
This issue is sponsored by:
The Darkest Knight
In 2012, a company called Knight Capital lost $440 million in 45 minutes and disrupted the stock market, all due to one failed deployment. Don’t let this happen to you. Read more deployment horror stories in this blog.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Love the detail from DoorDash engineers on their migration to Prometheus. However, I do find it odd they chose to omit which legacy “monitoring backend” they moved away from (StatsD is an aggregation service, not a backend). A quick Google search shows they were using Wavefront (now a smaller part of VMware Aria) back in 2018. Looking forward to their re-architecture for OTLP metrics ingestion in a couple years. 😉
Always good to see open standards prevail. I’m curious to see if this (OTLP ingestion endpoint), combined with the continued popularity of OpenTelemetry, forces Prometheus maintainers to reconsider their commitment to pull-based collection.
P.S. You should definitely read the caveats in this comment regarding the performance hit around these converted namespaces (as Prometheus labels). Sounds like a cache addition will be forthcoming.
Speaking of OpenTelemetry, here’s a solid introduction for anyone looking to learn the basics before diving in.
The next chapter from Sofia and her “Journey to Observability”. I still can’t tell if this is based on a real person or is supposed to be DevOps manga, but I’m here for it.
I was genuinely impressed by this post from Slack engineers. They have a strong history of Observability leadership, but it’s clear they give a damn about their customers’ experience. This is an excellent read, I virtually guarantee you’ll take away something interesting.
This appears intended to serve as an exhaustive overview of Kubernetes observability; it does a decent job touching on all of the related topics, but you’ll want to perform deeper research on any specific area. Frankly, if you just grabbed all of the section titles they would make a great checklist for your manager. 😜
How to add effective logging to your application, with some tips for promtail, Loki, and Grafana dashboards. This almost reads like a confessional by a developer figuring this stuff out for the first time, but they did a nice job capturing their experience.
A simple yet handy example for adding (or merging) custom labels in your logs with Fluentd.
“Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus.”
“This module provides a transport for pino that forwards messages to a Loki instance.”
“Very low overhead Node.js logger.”
PromCon EU 2023 is the eighth conference fully dedicated to the Prometheus monitoring system. It will take place 2023-09-28 & 2023-09-29 (Thu & Fri) in Berlin as a single-track event with space for 300 attendees.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor