Latest on

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

LightStep logoLightStep is one of the new breed of tools out there I’m excited about. Designed with modern, high-scale, high-traffic architectures in mind, LightStep makes it easy to spot, diagnose, and solve performance issues. Check it out here. (SPONSORED)

DevOps is Dead with James Turnbull - Real World DevOps

Probably the most common question I received when I told people I was writing a book about monitoring was, “Have you read James Turnbull’s book?” I’m putting that to rest with a delightful conversation with James Turnbull on a variety of topics, including which of his own books is his favorite, some not-so-subtle digs at Kubernetes, and why James thinks DevOps is dead.

Elasticsearch Guide

A pretty in-depth, detailed guide on Elasticsearch, covering how it works, managing it, how to monitor it, and more.

Tinder & Grafana: A Love Story in Metrics and Monitoring

tldr: Prometheus, Grafana, and some fun apropos jokes

How to monitor Golden signals in Kubernetes

Because everyone can generally do with a refresher course on the Golden Signals from time to time.

How Not to Fail at Visualization

These are really great examples of good and bad visualizations with Grafana. Highly recommend you read this and take notes.

The On-Call Game

This is super fun and very well-designed. I would absolutely love to have more scenarios built out, especially ones that are trickier.

Monitoring TLS Certificates with Telegraf

Someone remarked recently that monitoring SSL certificate expiration is probably the quintessential job of Nagios… and you know what? I didn’t even argue. So, happy to see doing this with Telegraf was so easy.

Raygun logoEver wondered why your CEO doesn’t give a toss about technical debt?

The folks at Raygun set out to learn why, interviewing the executive leadership at Xero, Pushpay, and Vend to find out what’s really going on and how they think about engineering effort and software quality. (SPONSORED)

3 Awesome Visualization Techniques for every dataset

I love good visualizations. I also love Python. This article gives me both. <3

From events to Grafana annotation

This is a super neat tool for making Grafana annotations a lot easier to create.

Worth a Look: Public Grafana Dashboards

I’ve linked to a few of these before, but there’s some new ones I didn’t know about. They might give you some ideas for dashboard organization.

Alerting on SLOs like Pros

A tale of SLOs at SoundCloud. Very useful stuff in here.

Is Operational Sympathy A (Good) Thing?

I absolutely love this article for two big reasons: 1) I disagree with a few of the main points, and 2) the author is clearly way smarter than me. Highly recommended read.

Distributed Tracing — we’ve been doing it wrong

A bit of a teaser: “Tracing might still remain something that, once deployed, doesn’t unlock enough value to be of any practical use in the most commonly used debugging scenarios.” 100% agreed – last time I suggested to a vendor that tracing wasn’t terribly valuable, I ended up in an hour long debate. Glad I’m not the only one.

A Practitioner’s Guide to System Dashboard Design Part 1

Following up on the conversation Cory and I on Real World DevOps, here’s his article on the foundations of dashboard design for operations.

GitPrime logo20 Patterns to Watch for in Engineering Teams

GitPrime’s new book draws together some of the most common software team dynamics, observed in working with hundreds of enterprise engineering organizations. Actionable insights to help you debug your development process with data. Get Your Copy. (SPONSORED)

Logs vs. metrics: a false dichotomy

From the article: “This post is about how “logs vs. metrics” is a false dichotomy, and how thinking in this binary prevents us from seeing simpler ways to monitor our systems.”

Why do engineers not care about application monitoring?

The title may come across as clickbait-y to some of you, but the truth is that the state of most of the industry is exactly where the author is coming from. By virtue of being on this newsletter, you’ve self-selected into a group with a higher level of awareness and interest in monitoring, but the truth is that while the state of monitoring has come a long way, we’ve still got a long way to go.

Monitoring everything…?

I won’t ruin the punchline for you, but you would be shocked how often this occurs even in multi-million dollar companies.

Why are software developers confused by Kafka and immutable logs?

This article starts off with some musings about overly-complex software architecture, but it starts to get really good about halfway in when the stuff about Kafka shows up. Stick with it; I promise it’s worth the read.

Nobody wants to be woken up at 4 am

The folks at Farfetch discuss their monitoring journey and current stack too. TL;DR: Grafana, Thanos, Prometheus, Alertmanager,

Chernobyl DevOps: Software Engineering, Disaster Management, and Observability

Following on the heels of Netflix’s latest documentary on Chernobyl, the author of this article relates the incident to software engineering. Also, lots of really interesting stuff about the incident that I didn’t know.

Hands-On Infrastructure Monitoring with Prometheus

A new book on Prometheus is out and available for purchase.

Blue Matador logoJoin DevOps expert and CEO of Blue Matador, Matthew Barlocker, for a CloudWatch Guided Tour Webinar on either July 25th or July 31st. You’ll learn about CloudWatch concepts, alarms, metrics, best practices, and more. Save Your Spot. (SPONSORED)

How we implemented RED and USE metrics for monitoring

The folks at THRON discuss their monitoring and current stack. There’s some good stuff in here about instrumentation frameworks (RED, USE, Golden Signals).

Grafana Tutorial: How to Create Kiosks to Display Dashboards on a TV

Because everyone loves dashboards on big ass TVs.

The CASE Method: Better Monitoring For Humans

Everyone loves a good framework and this is a super good one focused on alert design.

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor