There are some really great talks in here and something for everyone: on-call, Sensu, Icinga, tracing, and much more. Also a mixture of English and German talks.
Emergency services have some well-defined and tested incident command techniques, and they translate to software pretty well. The speaker, Brent Chapman, has spent years doing incident command for Google SRE, Black Rock City, CERT, and air search & research. It’s a really great talk.
There are some interesting tidbits in here about logging and events, including what is possibly the first log analysis exercise.
I’ll be thrilled with Prometheus ships with first-party HA support, but until then, Thanos looks like a great solution.
For the Icinga folks out there: automatic certificate scanning, monitoring, and some reporting. Very neat.
As a followup to their previous article on unit testing formulas, you can also do the same with your Prometheus alerts.
Lesson to be learned here: take a step back and really understand what you’re trying to achieve. Less infrastructure to maintain is better, and bonus points if less to maintain also comes with more capability.
If you work for a company with a large volume of business on Black Friday/Cyber Monday, you’re no stranger to all the prep work that leads up to showtime. I’ve had a front row seat for several of those exercises during my time as a staff consultant, but one thing that always gets me is everyone spends most of the prep on testing the business’s application and infrastucture but little (if any) attention is paid to third-party dependencies. This past Black Friday/Cyber Monday, one such dependency caused an increase in load time for many websites. Don’t forget you should be vetting your dependencies and vendors too.
If you weren’t able to make it to InfluxDays 2018 in San Francisco, Paul Dix’s (CTO) keynote recording is up, which talks at length about wehre Influx is heading and the new Flux query language.
In the category of “well, better late than never” or perhaps “You know how our partners are really good at a thing? Let’s do that thing too. Surely it won’t bother them,” AWS now has prebuilt dashboards for every service.
See you next week!
— Mike (@mike_julian)
Monitoring Weekly Editor