SPECIAL EDITION: Q3 2024 Best of

Happy to be back with you this week with our quarterly “Best Of” issue, looking back on the most popular articles from the past few months. 💕📈

P.S. In light of the recent disaster in the SE United States caused by Hurricane Helene, please consider donating to a supported cause helping those in need at this time. I would personally recommend World Central Kitchen, but there are many different ways to support these folks in need. Thank you.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Taming Logs

A solid list of best practices and considerations for log formatting, collection, structure and more.

Automated OpenTelemetry traces for Bash!

Excellent use of OpenTelemetry that should open up a lot of possibilities for platform teams and anyone who runs software they can’t easily instrument themselves.

Is It Time To Version Observability? (Signs Point To Yes)

Some real talk from Charity Majors on the successes (and failures) of “Observability 1.0”. There’s plenty to chew on here, and while I largely agree with her points, I still sense some bias towards particular types of developers and systems. Regardless, an excellent post.

perses/perses

“Facilitates a seamless "dashboards as code" workflow by introducing an innovative and precisely defined dashboard definition model.”

Building a large-scale Observability Ecosystem

An insightful look at one company’s observability journey. This feels like a solid roadmap for anyone planning a similar transformation.

UI Improvements for Prometheus 3.0

Julius Volz offered a sneak peak of the UI improvements in the upcoming Prometheus 3.0 release. Tons of changes landing soon, check out the pre-release if you’d like to test it out and report any issues.

11 Takeaways from Observability Engineering Book

Some solid notes and highlights from “the” observability book. Unsurprisingly, there’s a bit of overlap from my own recent reading of the Learning OpenTelemetry book.

Building a cost-effective logging platform using Clickhouse for petabyte scale

It continues to impress me just how adaptable ClickHouse is to various workloads. Props to Zomato engineers for sharing their story with a useful level of detail.

Network Observability: Beyond Metrics and Logs

A reminder of the importance of network ~~monitoring~~ observability along with some good and bad examples of how it’s been done.

From Chaos to Clarity: Using Loki and Grafana to Tame Your Logs

If you haven’t already tried out Loki for yourself, this article is a solid introduction and getting started guide. Would like to see the author add a follow-up post to demonstrate querying and debugging in greater detail.

Perses is accepted as a CNCF Sandbox project

Looks like Grafana has some future competition in the Perses project. I appreciate their focus on a GitOps and CLI workflow, which has always felt like a bit of an afterthought for other dashboard projects.

Monitoring of Monitoring

Tips and considerations for anyone dealing with the traditional “who watches the watchers” conundrum.

Burn Rate Is a Better Error Rate

Helpful comparison of burn and error rates from Datadog. Props to the author for simplifying the math.

Destroy on Friday! A Chaos Engineering Experiment - Part 1

Fun post from Honeycomb describing a recent chaos engineering experiment. I wish more companies would share these types of learnings.

VictoriaLogs: an overview, run in Kubernetes, LogsQL, and Grafana

Interesting look at VictoriaLogs, how it compares with Grafana Loki, and some of the missing bits that may hold back its adoption for now.

Monitoring in Kubernetes: Best Practices

A decent collection of monitoring concerns and best practices for Kubernetes. 50/50 chance this was written by AI, but it still has some solid points. 😅

What makes a good on-call shift system for DevOps engineer?

A look at some of the primary considerations for choosing an on-call service provider and a quick comparison of four of the most popular options.

Unveiling the Power Duo: osquery and osctrl

A deep dive on two popular open source projects for system introspection and monitoring. Chances are you’re familiar with osquery, but you might not be aware of the osctrl tool for centralized management of osquery agents.

otel-tui: A TUI Tool for Viewing OpenTelemetry Traces

Fun new project for interacting with OTel traces inside the terminal. Love it!

OpenTelemetry and vendor neutrality: how to build an observability strategy with maximum flexibility

The ubiquity of OpenTelemetry has given users more power than ever before to avoid the hassles of vendor lock-in. But it’s not foolproof; there are still steps you can and should take to ensure that you’re using OTel effectively and giving yourself flexibility to adapt in the future.

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor