Issue #036

Hey folks, welcome to another installment of Monitoring Weekly! Did you write something about monitoring recently? Maybe got an idea rolling around in your head? Send it on over and let the community learn from you. 😀

Monitoring News, Articles, and Blog posts
Time Series Database Lectures #6 – Mike Freedman (TimescaleDB)

The final talk in the series, this time covering TimescaleDB. Timescale is interesting in that it’s built as an extension to PostgreSQL rather than built from the ground-up, yet still has some pretty great performance numbers.

InfluxDB 1.4 | InfluxQL Enhancements, Prometheus Read/Write & More

There’s a few new things here but the one that caught my eye is new EXPLAIN command in InfluxQL: prepend your query with EXPLAIN to see how InfluxDB is going to execute it, thus allowing you to determine how expensive of a query you have. Basically the same idea as found in many SQL-based RDBMS.

Elastic Stack 6.0.0 GA is Released

Woo! Lots of new stuff in this major milestone release: zero-downtime cluster upgrades, multiple pipelines in Logstash, major new features in Beats, and much more.

Best Practices for Observability

Not sure where to start on observability? Charity Majors has got you covered with some really helpful tips and best practices.

Practical Monitoring: Book Review and Q&A with Mike Julian

I recently had an interview with InfoQ.com about my upcoming book, Practical Monitoring. Check it out. Want a free chapter before you decide to buy? Here you go.

LISA17 – Sample Your Traffic but Keep the Good Stuff!

I had the pleasure of hearing the beta version of this talk at SF Metrics Meetup the week before LISA and this version from LISA has improved on an already great talk.

Transitioning Logging and Monitoring Systems at The Economist

The folks at The Economist set out to improve monitoring by leaps-and-bounds and made great initial headway with a good strategy: an internal two-day hackaton dedicated to improving monitoring. I really like the approach because it gets people out of the mindset of the usual day-to-day tasks.

Graphite Metrics Stack with Jason Dixon and Dan Cech – Episode 136 – Podcast.init

It really needs no additional exposition: a really good podcast episode with Jason Dixon and Dan Cech about Graphite and the Graphite metrics ecosystem.

[email protected]: Managing Incidents Part I

I quite like this article’s straightforward explanation of alerting as an automated pipeline. Whether an alert is generated by systems automatically or manually triggered by a person, it should use the same alert pipeline. Taking it a step further, Xero ensured that all alerts contained everything necessary to be useful and manages their on-call rotation with code.

Availability has a new meaning. And it doesn’t include planned downtime

It always annoys me when companies carve out “planned downtime” from their availability reporting and goals. This article says what we’re all thinking: planned downtime punishes the customer for our failure to build resilient and maintainable systems.

See you next week!

— Mike (@mike_julian) Monitoring Weekly editor