Driving old, beat-up cars is both a treat and a nightmare, especially when it comes to figuring out why they’ve stopped working (this time). In many ways, diagnosing issues with any old car feels not-at-all dissimilar to monitoring for and diagnosing failures in software.
The folks at Dropbox have an interesting challenge: detecting and ingesting crash reports from all of their Dropbox client installs. Easier said than done, certainly. This article works through how they’re handling it and some of the “fun” issues they encounter. I thought it was pretty interesting that they implemented this prior to a move from a Python 2-based client to a new Python 3-based client, with this new crash reporting making the transition much smoother.
My book, along with several other incredible books, is on sale via a Humble Bundle right now. There’s ~$600 worth of top-notch books in the bundle–all yours at a fraction of the price. Bonus: a portion of the proceeds go to support Code For America. Seems like a win for everyone.
The Monitoring Issue – Linux Journal
My friend Corey Quinn (of Last Week in AWS fame) and myself each contributed an article about monitoring for Linux Journal’s monitoring issue which went live this week.
Why Your Server Monitoring (Still) Sucks – My five most common reasons your monitoring sucks.
The folks at Affirm talk through their own approach to monitoring: how their systems work, the tools they’re using, and what’s next for them (I’m really looking forward to this idea of directly hooking into SQLAlchemy events).
If you’re still wondering what the crap the noise is all about with TimescaleDB and why you might care, start here. Good timing too: they just announced TimescaleDB is now 1.0! Congratulations to the Timescale team on reaching the milestone.
The folks at New Relic continue on with their discussion of how they define and implement SLOs and SLIs in their SRE org.
Finally done with print (er, puts) statements everywhere in your Rails app? Maybe this seriously monster post on logging in Rails 5 is a good place to start for you.
Is your k8s setup costing you a fortune? Are you under/over-provisioned? How would you know? Maybe this gem will help: kubecost analyzes your k8s clusters on an ongoing basis (native Grafana integration!) and helps you understand what’s costing you and why.
Here’s an idea I can get behind: on-call doesn’t have to be inhumane. We can do better.
Tired of boring use cases for flame graphs? Long for something that’s just going to make your head spin? How about SQL plan executions visualized with flame graphs? For those of you that love getting into the nitty-gritty of SQL, here you go. Have fun.
I’ll be wandering around the venue on the first day. Come say hi if you see me!
I had the pleasure of speaking with the hiring manager recently and it sounds like a really awesome gig. If you’re into Ops/SRE/DevOps and love monitoring, click through to check it out and apply.
Want your job listed here? Why not submit a post to the job board? It’s only $199/ad for 30 days.
See you next week!
— Mike (@mike_julian)
Monitoring Weekly Editor