Issue #087

From The Community

Open Source Monitoring Conference 2018 – Recordings Now Available

There are some really great talks in here and something for everyone: on-call, Sensu, Icinga, tracing, and much more. Also a mixture of English and German talks.

Mastering Outages with Incident Command for DevOps: Learning from the Fire Department

Emergency services have some well-defined and tested incident command techniques, and they translate to software pretty well. The speaker, Brent Chapman, has spent years doing incident command for Google SRE, Black Rock City, CERT, and air search & research. It’s a really great talk.

History and irony of logging (in)

There are some interesting tidbits in here about logging and events, including what is possibly the first log analysis exercise.

Making Prometheus more awesome with Thanos

I’ll be thrilled with Prometheus ships with first-party HA support, but until then, Thanos looks like a great solution.

Icinga X.509 Module

For the Icinga folks out there: automatic certificate scanning, monitoring, and some reporting. Very neat.

Unit testing alerts with Prometheus

As a followup to their previous article on unit testing formulas, you can also do the same with your Prometheus alerts.

Do we need this ELK stack?

Lesson to be learned here: take a step back and really understand what you’re trying to achieve. Less infrastructure to maintain is better, and bonus points if less to maintain also comes with more capability.

Black Friday performance: Third-party outage strikes again

If you work for a company with a large volume of business on Black Friday/Cyber Monday, you’re no stranger to all the prep work that leads up to showtime. I’ve had a front row seat for several of those exercises during my time as a staff consultant, but one thing that always gets me is everyone spends most of the prep on testing the business’s application and infrastucture but little (if any) attention is paid to third-party dependencies. This past Black Friday/Cyber Monday, one such dependency caused an increase in load time for many websites. Don’t forget you should be vetting your dependencies and vendors too.

InfluxDays 2018 – Paul Dix Keynote

If you weren’t able to make it to InfluxDays 2018 in San Francisco, Paul Dix’s (CTO) keynote recording is up, which talks at length about wehre Influx is heading and the new Flux query language.

Amazon CloudWatch Introduces Automatic Dashboards to Monitor all AWS Resources

In the category of “well, better late than never” or perhaps “You know how our partners are really good at a thing? Let’s do that thing too. Surely it won’t bother them,” AWS now has prebuilt dashboards for every service.

See you next week!

— Mike (@mike_julian)
Monitoring Weekly Editor