Some fun and interesting articles this week. Any week I can reference Brendan Gregg’s work with flame graphs is probably a good one. Oh, and some great articles on Kubernetes and the Cilium CNI… enjoy! 🔥📈🔔
This issue is sponsored by:
Sysdig Monitor is making it easier to find important details about your clusters, namespaces, and deployments with a new feature called Advisor. In this on-demand webinar, you will learn how Advisor can help you debug and solve difficult Kubernetes problems 10x faster! Watch Now!
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
This article is more of an overview and explanation of Cortex than it is a tutorial for using it to scale your Prometheus clusters, but it’s still an accurate introduction to Cortex’s features and architecture.
An approachable explanation of flame charts and percentiles. After you read this one, go immerse yourself in Brendan Gregg’s massive collection of flame graph resources.
I’m not a Kubernetes expert by any means, but I still learned a ton about pod preemption and the limits and criteria related to it. If you’re responsible for monitoring a Kubernetes cluster you owe it to yourself to read this one.
I love that Airbnb has built their own incident management Slack chatops bot and shared the story with us here. However, our industry is stuffed with vendors that do precisely this. Unless you’ve identifed a specific reason to build your own I’d probably encourage you to focus on your core mission and just buy one off the shelf.
Part four of an excellent series on OpenTelemetry, this post covers the use of custom span attributes for instrumenting custom spans or metadata to your traces.
Part five of the same OpenTelemetry series, this post provides more context over the design of OpenTelemetry collectors and their internal components. Great stuff.
We’ve probably all been guilty of this at times, though I do think it can be helpful to at least track these categories internally for learning and planning purposes.
Great to see Mimir add experimental support for other time-series metrics formats, including my old personal favorite, Graphite. :)
Although this isn’t strictly a monitoring-related article, it provides a ton of useful context around the Cilium CNI that you’ll probably want if you have to work with or support it (and makes a great intro before reading the next story).
Speaking of Cilium, Datadog has published this helpful overview of the CNI and the metrics you’ll want to keep an eye on.
I’m a little hesitant to open my wallet to let AWS a) apply machine learning to all of my logs and b) send them enough logs in the first place to get enough data to drive some accurate anomaly detection. OTOH if you’re already in the latter bucket and need to automate some insights, this might be just what you’re looking for.
“Grafana Mimir proxies are a collection of open source software projects that provide native ingest capability for third-party applications into Mimir.”
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor