An eccentric mashup of monitoring posts this week, with an emphasis on metrics design and collection. And an old man yells at the cloud. 😂📢⛅
This issue is sponsored by:
Your on-call holiday survival kit is here.
In the spirit of the holidays, Chronosphere has packaged 4 presents to help Engineering teams march towards reducing stress and avoiding burnout. Put your best foot forward (in style) while moving towards on-call experiences that suck less. Get your kit!
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
A very unique look at the evolution (pun intended) of one company’s platform infrastructure, including Observability and related concerns.
This is a fascinating read on Prometheus out-of-order metrics, particularly if you’re a crufty old TSDB admin and former Graphite maintainer who argues this should have been supported(*) years ago. All teasing aside, it really is a very interesting post with plenty of relevant technical details and helpful bits for Prometheus admins.
* I acknowledge that all TSDB authors make compromises relevant to their respective requirements, but after having seen countless “new hot metrics engines” come and go, it feels inevitable to me that all competing TSDBs eventually settle on roughly the same feature set with the primary differences boiling down to implementation details and a select collection of bugs deemed too difficult to fix. Don’t @ me.
I’ve been guilty of “monitoring all the things” in the past, but we still hear the same question repeated year after year… “what should I be monitoring?” This post revisits numerous important considerations for metrics design and collection.
Monitoring for TLS versions and ciphers feels like a bit of an edge case, but I have no doubt there are security and compliance engineers in your org right now that would swoon over this.
I’ve genuinely enjoyed these monitoring deep-dives on Kubernetes components from Sysdig. Although much of this information is available in the official docs, it’s nice to see it aggregated for a specific controller, along with the metrics relevant to their health.
A fun side project for one dev advocate turned into an OpenTelemetry tutorial with a collection of cloud-native tools. There’s a good chance I’m still working through this as you’re reading these words. 😆
Do you need to monitor applications on-premises and in the cloud?
SolarWinds® Server & Application Monitor is designed to monitor your applications and their supporting infrastructure. Get continuous server monitoring, cross-stack correlation for your hybrid IT data, and the flexibility to monitor custom applications. Download a fully-functional 30-day free trial. (SPONSORED)
A thorough guide for setting up your own distributed tracing infrastructure with Apache SkyWalking to capture observability in an Istio service mesh. Honestly looks like a pretty painless way to get introduced to distributed tracing.
Uptime Kuma is one of those handy self-hosted services that nobody really talks about. We’ve covered it once before but it bears a reminder that this OSS project exists and remains a surprisingly competent alternative to paid health-check services.
A simple but relevant strategy for informing engineers’ choice of metrics instrumentation in their apps.
Grafana Labs is adopting a monthly release cycle for the next year of Grafana releases. I don’t think this will necessarily impact users (from my experience, most folks update irregularly based on security or desired feature releases) as much as stabilize their internal processes, but it’s still good to see them set expectations within the community.
“packets traffic visualization for kubernetes”
Monitorama is returning to Portland, OR next summer. The 2022 conference was a fantastic event and I look forward to seeing you all again in 2023.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor