This issue is sponsored by:

InfluxData logoEveryone wants stats-based alerting, but it’s not always straightforward to do. InfluxDB’s Holt-Winters support is pretty great though, and easy to use. Learn more about it here.

My thanks to InfluxData for their support of Monitoring Weekly.

From The Community

Why measuring accuracy is hard (and very important)! - Part 1, Part 2, Part 3, Part 4

I love this article, especially how it starts out: what, exactly, does accuracy mean? From the article: “But what is accuracy then? Is the algorithm expected to be 73% accurate when it reports a 73% confidence? What does accuracy mean in this situation? Does accuracy mean the number of answers with more than 50% confidence that were correct? Is 50% the right threshold? How do we count the null answers? What if both yes and no have less than the required confidence? How is that counted?”

OpenTracing: Distributed Tracing’s Emerging Industry Standard

The folks at SemaText have written a pretty great five-part series on OpenTracing.

How to collect, standardize, and centralize Golang logs

From the article: “This post will show you some tools and techniques for managing Golang logs. We’ll begin with the question of which logging package to use for different kinds of requirements. Next, we’ll explain some techniques for making your logs more searchable and reliable, reducing the resource footprint of your logging setup, and standardizing your log messages.”


Super neat visualization for understanding communication patterns in distributed systems, but if this doens’t make you start regretting breaking up your monolith, I don’t know what will.

Setting Up Your PagerDuty for Sweet Victory

I don’t normally link to what amounts to “how to use our product” articles, but given that roughly 99% of you probably use PagerDuty, it’s relevant. Especially because the organization and setup isn’t quite as intuitive or obvious as it probably could be.

phenomenal outages

Because outages can also be beautiful.

eduardobaitello/kubelogs: Interactively dump logs from multiple Kubernetes containers

From the site: “It is a bash script that uses your current kubectl context to interactively select namespaces and multiple pods to download logs from. It basically runs kubectl logs in a loop for all containers, redirecting the logs to local files.”

GrafanaCon L.A. Recap: Grafana 6.0, LGTM, and More!

Just a recap and a link to all of the videos from the recent GrafanaCon in Los Angeles.

Shift Changes, Updates, and the On-Call Architecture in Space Shuttle Mission Control

From my friend Thai Wood over at Resilience Roundup, he walks us through how on-call functions at NASA.

MonitorSF Meetup February 2019 - Effective Service Level Objectives

If you’re wondering how good your SLOs are, you should watch this.

This issue is sponsored by:

Scalyr logoWe’re hosting an online workshop on Tue 3/26 at 9:00am PT on building, deploying and monitoring containers. Sylvia Fronczak (Software Engineer) and Dave McAllister (Scalyr Community Guy) will show live code and examples to accompany container orchestration concepts. They’ll also show how to get started with monitoring containers. Sign up for the online workshop.


LogicMonitor Level Up Conference - June 24-26, 2019 - Austin, TX USA

For the LogicMonitor fans among you, the LogicMonitor Lever Up conference is this June. I’ll be speaking, so come hang out/heckle!

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor