Latest Articles on

What driving an old jalopy taught me about monitoring

Driving old, beat-up cars is both a treat and a nightmare, especially when it comes to figuring out why they’ve stopped working (this time). In many ways, diagnosing issues with any old car feels not-at-all dissimilar to monitoring for and diagnosing failures in software.

From The Community

Humble Book Bundle: DevOps by O’Reilly (pay what you want and help charity)

My book, along with several other incredible books, is on sale via a Humble Bundle right now. There’s ~$600 worth of top-notch books in the bundle–all yours at a fraction of the price. Bonus: a portion of the proceeds go to support Code For America. Seems like a win for everyone.

How to instrument Go code with custom expvar metrics

Considering some Golang apps and wondering about instrumentation? Here you go.

Logging in general and in ASP.NET Core

Don’t worry .NET folk, we’ve got you covered too.

Talking Technology: Charity Majors – Times Open

There are a lot of fantastic quotes in here, but since I’ve got alerting on the mind, this one stuck out to me: “… every time you get paged should really be about an unknown-unknown.”

thecasualcoder/tztail: tztail (TimeZoneTAIL) allows you to view logs in the timezone you want

Exactly as the title says. Kinda cool. Just remember: always write them in UTC; anything else is a nightmare.

Why Your Server Monitoring (Still) Sucks

I wrote an article for Linux Journal recently on the top five reasons your monitoring still sucks.

Scaling Time Series Data Storage — Part II

Following up on Part 1 from January 2018, the architecture hit a breaking point and needed a full reworking. Part 2 goes into what the problems were and the new architecture that’s come out of it.

Flux 0.7 Technical Preview - InfluxData

I know a whole lot of you are InfluxDB users, so this will be useful: the new Flux language is now in InfluxDB 1.7, but disabled by default. Turn it on and have some fun playing with the query language. Personally, I’m a huge fan of the explicit-over-implicit syntax. The Grafana datasource plugin is still in beta, though.

Heatmaps Make Ops Better

If you’re still wondering why heatmaps are awesome, this article has some great graphs to show their value and why other visualizations fall short for some data/questions.

The Problem with Percentiles – Aggregation brings Aggravation

Say it with me, now: stop aggregating percentiles.

FlameScope Pattern Recognition

You’re a fan of flame graphs, right? The author of those, Brendan Gregg, has come up with a new visualization, subsecond offset heatmaps, and a tool to help with them: FlameScope.

Measuring Performance With Server Timing

This article goes into some depth on a W3C proposal currently making its way through committee: the HTTP Server-Timing header. The header is intended to pass arbitrary metrics via the HTTP response, such as duration, cache result, or whatever you want. The examples section in the proposal gives some ideas about what’s possible. I’m really excited about this one.

**[High Leverage Ep. #1, Monitoring Observability with Monitoring Weekly’s Mike Julian](**

Joe Ruscio, founder of Librato and now General Partner at Heavybit, and I muse about monitoring for a while.

Why NOT to Build a Time-Series Database

It’s harder than it sounds as the author would certainly know, being the CEO and cofounder of Outlyer, a monitoring product.

London Monitoring Winter Meetup - November 27, 2018 - London, UK

FOSDEM2019: Monitoring Room CFP Open - February 2-3, 2018 - Brussels, Belgium

GrafanaCon LA 2019 - February 25-26, 2018 - Los Angeles, CA USA


Technical Evangelist - Wavefront - Location Flexible

I had the pleasure of speaking with the hiring manager and it sounds like a really awesome gig. If you’re into Ops/SRE/DevOps and love monitoring, click through to check it out and apply.

