Issue 184

This was an unusually rich week for monitoring and reliability topics. If you’re a time-series nerd like me, I think you’ll really enjoy the benchmark comparison between VictoriaMetrics and Grafana Mimir. Enjoy! 📈☕✨

This issue is sponsored by:

DataSet logo

Are you looking to modernize Log Analytics while controlling the cost?

DataSet is the cloud-native event data platform that enables teams to achieve petabytes of effortless scalability and real-time performance at a fraction of the cost. Get complete visibility into your entire stack and experience the DataSet difference for free.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

From Critical User Journey to SLO/SLIs

I’ve often preached to peers about the importance of monitoring and observability in the context of your product and users’ workflows. However, this is the first time I’ve heard of Critical User Journeys (CUJs); this strikes me as a fantastic way to frame this topic and to further the adoption of SLOs.

Monitoring our monitoring: how we validate our Prometheus alert rules

An impressively detailed look at how Cloudflare ensures that their Prometheus queries and alerts are as reliable as possible.

Ninja Van’s monitoring stack

I always enjoy seeing how different companies approach building their own monitoring stacks. This week we have an engineer from Ninja Van sharing the details of their architecture.

How out-of-order sample ingestion works in Grafana Mimir

An interesting read from Grafana around how they adapted Mimir to be compatible with time-series formats other than Prometheus.

When DevOps Dominoes Come Crashing Down

The fear of changing our systems can be a paralyzing effect. This post looks at how we might better plan for change in a way that instills confidence, rather than eroding it.

Test Automation Framework Reporting & Observability (Part 1)

Glad to see folks thinking about the intersection between test automation and observability. A really good primer to share with your favorite CI/CD engineers.

Grafana Mimir and VictoriaMetrics: performance tests

Engineers at VictoriaMetrics ran a performance benchmark against Grafana Mimir. Competition aside, it’s great to see teams across the two companies cooperating to ensure a level playing field. I’d love to see this continued as an ongoing series between the spectrum of TSDB systems out there.

P.S. I’m not at all surprised to see some of the data (e.g. memory use) from Mimir given their in-memory work to accommodate non-Prometheus metric formats.

Incident Review: Working as Designed, But Still Failing

An insightful yet concise review of a recent incident at Honeycomb.

How To Monitor Http traffic Of Third Party Service

I love discovering small, sharp tools like this one. If you’re ever in the need for monitoring live HTTP traffic in a tcpdump-like manner, but captured asynchronously in logs, this is a good place to start.

Kubernetes Observability — Monitoring K8s Jobs

A quick but handy tip to keep in mind when monitoring your Kubernetes jobs.

Tools

jbittel/httpry

“httpry is a tool designed for displaying and logging HTTP traffic.”

Job Opportunities

Lead / Staff Database Reliability Engineer at Shopify (NA Remote)

Senior Platform Engineer at Replicated (Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor