Issue 050

Hey folks, welcome to another installment of Monitoring Weekly! Did you write something about monitoring recently? Maybe got an idea rolling around in your head? Send it on over and let the community learn from you. :D

Monitoring News, Articles, and Blog posts
Product Analytics at Square

Not strictly monitoring, but fascinating nonetheless. At the more business-oriented end of metrics instrumentation is product analytics and understanding user behavior, using that information to drive decisions about the product itself. Seeing this perspective should start giving you a few ideas for how you might help your teams beyond Ops/Engineering.

Postmortems and More with J. Paul Reed (video)

The imitable J Paul Reed gives his thoughts on keeping postmortems blameless and much more. Definitely worth a watch.

Logging Improvements for InfluxDB 1.5.0

Structured logging comes to InfluxDB’s internal operations. This is a really neat update and super useful for those running larger-scale InfluxDB setups with a need to pay attention to the performance of InfluxDB itself.

Integrating Bro IDS with the ELK Stack - Part 1, Part 2

Bro is an IDS (Intrusion Detection System) that gathers tons of log data from your systems and network for the purpose of detecting security events. Suffice to say, it’s a lot of data to go through and you’re going to want to send it to a capable logging platform. This article goes through how to send, parse, and make effective use of that data in ELK.

Extracting useful duration metrics from HAProxy — Prometheus & Fluentd

Do you use HAProxy? Then you’re probably familiar with the treasure trove of metrics it provides. The author of this article walks us through getting HAProxy metrics into Prometheus, and specifically, getting the elusive Duration metric (from the RED–Rate, Errors, Duration–model) using fluentd.

Provisioning Grafana Data Sources and Dashboards Auto-Magically

The author takes us through how to use Grafana 5.0’s new programatic dashboard and data source definitions.

Ditch your Status Page: How we monitor Crisp at scale

A home-grown self-hosted status page app from the folks at Crisp, complete with some basic alerting functionality. Seems to be primarily built for Node-based apps, though it has some support for HTTP and TCP checks.

The Architecture of the Next CERN Accelerator Logging Service

I missed this article from back in December, but better late than never. The scale of their data is tremendous. Quote: “Huge detectors in the experimental sites observe the collisions producing around 1PB of events per second that is filtered to around 30-50 PB of usable physics data per year.”

Central Logging in Multi-Account Environments

Logging using purely AWS services has always felt kinda wonky to me, in a “wow, that’s…complicated” sort of way. It hasn’t actually changed, but this article from AWS goes over the architecture for a multi-account logging infrastructure using a dedicated AWS account for log processing, aggregation, and storage. I can’t imagine ever using this architecture, but it does give some interesting ideas.

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor