Announcing my new video course: Monitor Anything

How do you improve monitoring, specifically? Where do you even start? Worse: how do you know you’re done? If this resonates, I’ve got something in the works you’re going to love: a foolproof framework for how to monitor any app, service, or infrastructure. Read more about it and pre-order the course here.



This issue is sponsored by:

Rollbar logoFree Guide: Low-Risk Continuous Delivery

Adopting Continuous Delivery can bring a lot of benefits, but deploying to production can be filled with uncertainty. Learn how to reduce the risks with the right culture, architecture, and tooling to deploy early and often. Check out this free guide as we explore solutions.

Articles & News

How to Develop an Incident Response Plan for Your SaaS Business & How to Develop An Incident Response Checklist for Your SaaS Business – Threat Stack

This pair of articles does a great job of introducing some foundational security incident response stuff–topics I personally think deserve significantly more discussion within the realm of monitoring (it’s a damn shame that the security monitoring field is populated with FUD and big-E enterprise vendors with something they desperately want to sell you).

Elasticsearch Performance Tuning

No doubt if you run an ELK cluster, you’ve likely run into Elasticsearch performance challenges. This article makes a few suggestions for basic Elasticsearch performance improvements, such as the optimal memory size in order to avoid any issues with HEAP.

Simplifying InfluxDB: Shards and Retention Policies

Ever been confused by InfluxDB’s internals around retention? I know I have. This article does a wonderful job of explaining how it all works.

Monitoring and Alerting for A/B Testing — Detecting Problems in Real Time

It’s hard enough detecting failures in web systems normally, but the folks at Walmart Labs have gone a few steps further: detecting failures within web experiments, on live traffic, in real-time. Definitely a non-trivial problem and a fascinating topic.

Highlights of Monitorama PDX day 1

The post-Monitorama writeups are starting to roll in, after a good weekend’s worth of rest. I totally agree with this writeup, too: Logan McDonald’s talk was one of my favorite’s of the conference. While we’re waiting on the 2018 videos to be posted, you can watch the lightning talk version she did last year.

Guarding Your Packages with Snyk and Icinga

Snyk is a really neat product, but there’s one flaw: its alerts aren’t tied into your production alerting mechanisms. This article resolves that for Icinga by wrapping a Snyk API call with a custom Icinga plugin (the code is in the article). It should be easily adaptable to other monitoring systems as well.

Synthetic Monitoring: A Case Study of the Meltwater API

Integrating a test suite with statsd, Grafana, OpsGenie, Slack, and some custom HTML reports. This is actually really neat and gives me a lot of ideas.

Exploring, understanding and monitoring macOS activity with osquery

Only a slidedeck instead of a video, but it’s written in a way that’s still really useful. I’ve been itching for a reason to use osquery for a while and this article just reminds me of that.

A Postmortem Template - Michael Kehoe

Hat tip to SRE Weekly for tuning me in to this one: a very detailed, and thought-out postmortem template. The only thing I’d personally change is the “root cause” bit to “contributing factors”. After all, there’s no such thing as a root cause.

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor