Issue 197

An eccentric mashup of monitoring posts this week, with an emphasis on metrics design and collection. And an old man yells at the cloud. 😂📢⛅

This issue is sponsored by:

Chronosphere logo

Your on-call holiday survival kit is here.

In the spirit of the holidays, Chronosphere has packaged 4 presents to help Engineering teams march towards reducing stress and avoiding burnout. Put your best foot forward (in style) while moving towards on-call experiences that suck less. Get your kit!

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Making a technical platform…

A very unique look at the evolution (pun intended) of one company’s platform infrastructure, including Observability and related concerns.

Part 1
Part 2

Understanding Duplicate Samples and Out-of-order Timestamp Errors in Prometheus

This is a fascinating read on Prometheus out-of-order metrics, particularly if you’re a crufty old TSDB admin and former Graphite maintainer who argues this should have been supported(*) years ago. All teasing aside, it really is a very interesting post with plenty of relevant technical details and helpful bits for Prometheus admins.

&ast; I acknowledge that all TSDB authors make compromises relevant to their respective requirements, but after having seen countless “new hot metrics engines” come and go, it feels inevitable to me that all competing TSDBs eventually settle on roughly the same feature set with the primary differences boiling down to implementation details and a select collection of bugs deemed too difficult to fix. Don’t @ me.

Phantom Metrics: Why Your Monitoring Dashboard May Be Lying to You

I’ve been guilty of “monitoring all the things” in the past, but we still hear the same question repeated year after year… “what should I be monitoring?” This post revisits numerous important considerations for metrics design and collection.

k8spacket — are your TLS connections inside the cluster still secure?

Monitoring for TLS versions and ciphers feels like a bit of an edge case, but I have no doubt there are security and compliance engineers in your org right now that would swoon over this.

How to monitor kube-controller-manager

I’ve genuinely enjoyed these monitoring deep-dives on Kubernetes components from Sysdig. Although much of this information is available in the official docs, it’s nice to see it aggregated for a specific controller, along with the metrics relevant to their health.

Running the OpenTelemetry Demo App on HashiCorp Nomad

A fun side project for one dev advocate turned into an OpenTelemetry tutorial with a collection of cloud-native tools. There’s a good chance I’m still working through this as you’re reading these words. 😆

Loop1 logo

Do you need to monitor applications on-premises and in the cloud?

SolarWinds® Server & Application Monitor is designed to monitor your applications and their supporting infrastructure. Get continuous server monitoring, cross-stack correlation for your hybrid IT data, and the flexibility to monitor custom applications. Download a fully-functional 30-day free trial. (SPONSORED)

How to Use SkyWalking for Distributed Tracing in Istio?

A thorough guide for setting up your own distributed tracing infrastructure with Apache SkyWalking to capture observability in an Istio service mesh. Honestly looks like a pretty painless way to get introduced to distributed tracing.

️Uptime check of external sites & services

Uptime Kuma is one of those handy self-hosted services that nobody really talks about. We’ve covered it once before but it bears a reminder that this OSS project exists and remains a surprisingly competent alternative to paid health-check services.

How to create metrics that really matter?

A simple but relevant strategy for informing engineers’ choice of metrics instrumentation in their apps.

Grafana releases: New 2023 release schedule

Grafana Labs is adopting a monthly release cycle for the next year of Grafana releases. I don’t think this will necessarily impact users (from my experience, most folks update irregularly based on security or desired feature releases) as much as stabilize their internal processes, but it’s still good to see them set expectations within the community.

Tools

k8spacket/k8spacket

“packets traffic visualization for kubernetes”

Events

Monitorama PDX 2023 - June 26-28 (Portland, OR)

Monitorama is returning to Portland, OR next summer. The 2022 conference was a fantastic event and I look forward to seeing you all again in 2023.

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor