Monitoring News, Articles, and Blog posts
Thoughts on Monitoring Docker / Containers

A high-level retrospective on how we think about monitoring containers, their container manager, and/or the applications running within.

We are now Grafana Labs

The company behind Grafana announces their rebranding effort, the launch of GrafanaCloud: a hosted version of Grafana, and future plans for the Grafana project.

Simple command line stats

We all love a well-designed, thought-out, permanent solution to an engineering problem. Of course, sometimes a clever, quick-and-dirty approach is just what you need.

Debugging a Docker Heisenbug in production

A deep-dive troubleshooting of a tricky Docker networking performance problem and the difficulty in observing transient performance problems.

NSM hardware

Network security monitoring is often overlooked in the web-centric monitoring world, and can be taken for granted by those of us in cloud-native or hybrid architectures. This article goes through how to spec physical hardware for a large-scale Bro deployment.

Docker containers log transport and aggregation at scale

Logging at scale is a common problem for most companies, and especially so with the prevalence of Docker. This “configs-included” guide takes you through a full ELK setup geared specifically toward logging when you’re dealing with more Docker than you can shake a stick at.

Driving user growth with performance improvements

The Pinterest engineering team demonstrates how instrumentation and iterative performance improvements can be used to improve the frontend experience and drive user growth.

Interesting & Useful Tools
Introducing Netflix Stethoscope

Netflix encourages strong security in their corporate IT environment by providing self-service tools for their users. Stethoscope is their first project to embrace the Netflix “User Focused Security” model, encouraging user education over wrist-slapping.

Alerting Framework at Airbnb

Many companies at scale often opt to build internal monitoring tools, but Airbnb shows it’s not necessary. With a bit of code, they were able to automate and extend configuration of Datadog alerts across several teams.

ctop: concise commandline monitoring for containers

Everyone loves top, right? Now there’s ctop: top for containers. Really not much more to say. :)

Netflix Security Monkey on Google Cloud Platform

Netflix’s Security Monitoring, their automated tool for ensuring security compliance settings in AWS, now has beta support for Google Cloud Platform. More interestingly, though, I like that they’re breaking Security Monkey into smaller, more composable pieces. Composable monitoring ftw.

Beringei: A high-performance time series storage engine

Facebook has open-sourced their in-memory, high write-rate / low read-latency time series database. Designed around the use case of quickly loading a whole lot of metrics across a long time period at once.

