Issue #086

Latest Articles on monitoring.love

What driving an old jalopy taught me about monitoring

Driving old, beat-up cars is both a treat and a nightmare, especially when it comes to figuring out why they’ve stopped working (this time). In many ways, diagnosing issues with any old car feels not-at-all dissimilar to monitoring for and diagnosing failures in software.

From The Community

How Safe is Your Home’s Air? The Internet of Things and Air Quality Monitoring during Wildfires

For those that didn’t know, California has been on fire for the past couple weeks in the most devastating wildfire in California history. The scene outside has been, well, apocalyptic. Fred Moyer at Circonus, who lives just down the road from me, wrote up Take 2 of his air quality analysis using some IoT sensors, a metrics tool (Circonus, of course), and a bit of toothpicks-and-glue for good measure. Final verdict: don’t go outside, San Franciscans.

Observability at Scale: Building Uber’s Alerting Ecosystem

On a much lighter note, Uber’s alerting service is interesting. It’s mostly inhouse tools put together in a pipeline, though they do rely on Graphite’s query language for metric queries. There’s some built-in alert dedupe going on, too.

Why Use K-Means for Time Series Data? (Part Three)

Get up to speed with Part 1 and Part 2. Once you’ve got that, come back for Part 3 and a bit more head-spinning.

Time of day based notifications with Prometheus and Alertmanager

You know how it’d be great to only send alerts during certain times of day? Turns out, that’s an open problem with Prometheus, but this article has a good approach that relies on PromQL. It’s a bit, well, involved, but it works. Also: timezones are hard.

Unit testing rules with Prometheus

Speaking of Prometheus, the latest promtool has new functionality to test your PromQL expressions and ensure they’re doing what you intend.

Get Application Performance Metrics on Python Flask With Elastic APM on Kibana and Elasticsearch

The author walks us through instrumenting a Python app with Elastic APM. I don’t know why, but I’ve always had in my head that Elastic’s APM was a paid product, but it’s actually not. You really can get a free APM tool with them. That’s kinda cool.

Observability Using Abstracted IO

I think it’s interesting to see companies discussing abstraction layers for monitoring tools lately. With so many specialist tools, it’s become a huge pain in the ass to manage instrumentation without having to do it multiple times for each vendor you may be using.

Garbagedog: How eero does continuous monitoring of Java garbage collection

Aside from an awesome name, this tool from Eero does exactly what it says: makes monitoring Java GC much easier.

How to deal with the seasonality of a market?

Hope you like math, cause there’s a whole bunch of it. I include this mainly because of the implications on capacity planning (for those of you who do capacity planning exercises).

Incidents As We Imagine Them Versus How They Actually Happen

I had the good fortune to see the imitable John Allspaw do this talk live at this year’s PagerDuty Summit and it’s just as good the second time around–maybe better, even. I strongly recommend watching this video and letting John turn your understanding of incidents on its head.

Open-Sourcing Our Incident Response Training | PagerDuty

Speaking of incidents, PagerDuty just open-sourced their incident response training to go along with their public incident response documentation.

Travis CI <3 Honeycomb

I’m including this not because it’s a bunch of actionable stuff, but because a lot of you keep asking me some variation of, “Why would I use Honeycomb? I don’t understand what it’s for.” They really are building something we’ve not seen before, so I think it’s worth linking to and talking about. (yeah, it is true that many of you are running systems that wouldn’t benefit from tools like this–that’s totally okay too)

Jobs

Technical Evangelist – Wavefront – Location Flexible

I had the pleasure of speaking with the hiring manager and it sounds like a really awesome gig. If you’re into Ops/SRE/DevOps and love monitoring, click through to check it out and apply.

Want your job listed here? Why not submit a post to the job board? It’s only $199/ad for 30 days.

See you next week!

— Mike (@mike_julian)
Monitoring Weekly Editor