Issue 087
This issue is sponsored by:
Got a neat product you think doesn’t get enough attention? An event you think everyone should know about? Something else entirely? Sponsorships are open to all product types and industries–not just those with a monitoring product.
Learn more about sponsorship opportunities in Monitoring Weekly
From The Community
Open Source Monitoring Conference 2018 - Recordings Now Available
There are some really great talks in here and something for everyone: on-call, Sensu, Icinga, tracing, and much more. Also a mixture of English and German talks.
Mastering Outages with Incident Command for DevOps: Learning from the Fire Department
Emergency services have some well-defined and tested incident command techniques, and they translate to software pretty well. The speaker, Brent Chapman, has spent years doing incident command for Google SRE, Black Rock City, CERT, and air search & research. It’s a really great talk.
History and irony of logging (in)
There are some interesting tidbits in here about logging and events, including what is possibly the first log analysis exercise.
Making Prometheus more awesome with Thanos
I’ll be thrilled with Prometheus ships with first-party HA support, but until then, Thanos looks like a great solution.
For the Icinga folks out there: automatic certificate scanning, monitoring, and some reporting. Very neat.
Unit testing alerts with Prometheus
As a followup to their previous article on unit testing formulas, you can also do the same with your Prometheus alerts.
Lesson to be learned here: take a step back and really understand what you’re trying to achieve. Less infrastructure to maintain is better, and bonus points if less to maintain also comes with more capability.
Black Friday performance: Third-party outage strikes again
If you work for a company with a large volume of business on Black Friday/Cyber Monday, you’re no stranger to all the prep work that leads up to showtime. I’ve had a front row seat for several of those exercises during my time as a staff consultant, but one thing that always gets me is everyone spends most of the prep on testing the business’s application and infrastucture but little (if any) attention is paid to third-party dependencies. This past Black Friday/Cyber Monday, one such dependency caused an increase in load time for many websites. Don’t forget you should be vetting your dependencies and vendors too.
InfluxDays 2018 - Paul Dix Keynote
If you weren’t able to make it to InfluxDays 2018 in San Francisco, Paul Dix’s (CTO) keynote recording is up, which talks at length about wehre Influx is heading and the new Flux query language.
Amazon CloudWatch Introduces Automatic Dashboards to Monitor all AWS Resources
In the category of “well, better late than never” or perhaps “You know how our partners are really good at a thing? Let’s do that thing too. Surely it won’t bother them,” AWS now has prebuilt dashboards for every service.
This issue is sponsored by:
Got a neat product you think doesn’t get enough attention? An event you think everyone should know about? Something else entirely? Sponsorships are open to all product types and industries–not just those with a monitoring product.
Learn more about sponsorship opportunities in Monitoring Weekly
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor