Issue 162

A fun and varied week of articles, with something for everyone. Don’t miss the article from Uber engineering and their discovery for minimizing P99 spikes on CPU workloads. Oh, and there are only a few days left to submit a talk proposal for Monitorama PDX 2022. Hope to see you there!

This issue is sponsored by:

Timescale logo

🎓 Learn OpenTelemetry tracing with this lightweight microservices demo

The Promscale team has built a lightweight, easy-to-deploy microservices demo instrumented with OpenTelemetry so you can play around with tracing. The demo also includes 6 pre-built Grafana dashboards to monitor upstream and downstream dependencies, throughput, latency, and error rates. Check out this blog post for a complete walkthrough!

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Miro Data Engineering team’s journey to monitoring

What happens when your company grows by leaps and bounds but your infrastructure falls behind? Miro shares their story of scaling up their monitoring architecture and processes to keep up with their data platform.

Observability

One engineer’s hot take on what Observability is, or is not. Grab your popcorn.

Measuring Latency Overhead with Own Time

How Airbnb engineers managed to isolate runtime performance characteristics of their service mesh as a single metric.

99% to 99.9% SLO: High Performance Kubernetes Control Plane at Pinterest

An excellent article on how Pinterest has increased the reliability and efficiency of their Kubernetes-based control plane.

How relabeling in Prometheus works

A handy guide from Grafana Labs on how Prometheus’ relabeling works and when (and how) to use it effectively.

Avoiding CPU Throttling in a Containerized Environment

This isn’t strictly monitoring related, but it feels like a big deal… affecting anyone running CPU-heavy workloads in cgroups. Plus, it’s worth it just to see those P99 spikes go away. 😁

Why Don’t You Use …

I love this article from Brendan Gregg on why we do (or don’t) choose certain products. Frankly, it feels like the making of a great checklist for any new potential vendor.

Metricbeat and Filebeat on RKE2

If you’re considering RKE2 but already have an investment Elasticsearch and Kibana, here’s a pattern for using them in place of Rancher’s default options.

Everything I Needed to Know About Observability, I Learned from ‘Bewitched’

Observability lessons from a fifty-year-old sitcom.

What is API Observability?

I don’t really grok why the author chose to write this in the context of API observability, but it’s still a solid standalone article on observability in general. I particularly appreciate the background on monitoring software and principles, discussion on signals, and then catching up with the “state of the art” of observability tooling today.

Longhorn Storage for High Availability of Prometheus pods on Kubernetes

How one engineer uses Longhorn distributed block storage for their HA Prometheus nodes on Kubernetes.

Events

Call for Participation - Monitorama PDX 2022

Monitorama is returning to Portland, OR this summer. The organizers have recently opened up their CFP for a limited number of speaking slots. Deadline for submissions is March 31, 2022.

Job Opportunities

Senior Platform Engineer at Replicated (Remote)

Customer Reliability Engineer at Replicated (Remote)

Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor