Fun week full of stories on systems design, instrumentation, and problem solving with observability tooling. Enjoy! 🧠🙏😎

This issue is sponsored by:

Grafana logo

ObservabilityCON 2024 registration is open. Whether you’re in an empire state of mind or an observability bind, join us for bagels and logs in New York City, September 24-25. Connect with Grafana Labs experts and preview LGTM Stack releases and solutions. Register now and join us in person.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Building a large-scale Observability Ecosystem

An insightful look at one company’s observability journey. This feels like a solid roadmap for anyone planning a similar transformation.

Managing Critical Alerts through PagerDuty’s Event Rules

Sometimes it just takes a little bit of extra context to get folks to care that little bit more about an alert and to take preventative action.

OpenTelemetry Frontend Demo

Love to see OpenTelemetry making inroads to front-end telemetry. Looks like much of this post was inspired by a recent talk at KubeCon EU.

Solving large logs with ClickHouse

A deep-dive on what it took for one vendor to reduce large log query time in ClickHouse.

otel-tui: A TUI Tool for Viewing OpenTelemetry Traces

Fun new project for interacting with OTel traces inside the terminal. Love it!

Getting started with Grafana: best practices to design your first dashboard

Just like alerting, there is good and bad dashboard design. This post covers some of the basic necessities for any effective dashboard.

Cribl logo

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl to analyze, collect, process, and route all IT and security data, delivering the choice, control, and flexibility required to adapt to their ever-changing needs. (SPONSORED)



Timeseries Indexing at Scale

Fascinating look at the challenges of indexing time-series data at “Datadog Scale”.

Observability using OpenSearch + Grafana

A cautionary tale (with pointers) for anyone considering Grafana over Kibana with OpenSearch.

A Journey as an Incident Commander: The Unsung Hero of Crisis Management

Having worked for organizations where nobody really knew (or cared to know) how to IC effectively, this one hits close to home. I’d have loved this post to dive deeper into specifics, but it’s still a good reference to share with a team that questions the importance of this role (or their ability to perform it).

Monitoring Instance Metrics in a Golang program

Although I think the example here isn’t the best way to collect related machine metrics, it’s still a useful pattern for Go developers learning how to instrument their code.

Tools

ymtdzzz/otel-tui

A terminal OpenTelemetry viewer inspired by otel-desktop-viewer

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor