Issue 006

Thanks for joining us for another issue of Monitoring Weekly!

Monitoring News, Articles, and Blog posts
Metrics @ Robinhood

Part One of a multi-part series looking at how Robinhood (the stock trading service, not the backwoods outlaw) collects, manages, and interacts with the metrics used to monitor their internal services. In this post we get a first look at how application metrics get routed through statsd and Kafka to their OpenTSDB storage backend.

Monitoring Redis

Mike Perham gives some tips for monitoring Redis: gathering internal stats from the INFO command, avoiding disk pages, watching for network latency, and identifying slow commands.

Metrics @ Robinhood — Part Two

Part Two of Robinhood’s series on metric collection and usage. Here we get a look at how they leverage Sensu and Riemann for health checks, stream processing, and alerting.

Elasticsearch Cluster Lifecycle at eBay

Elasticsearch clusters are so common at eBay that they’ve defined a repeatable lifecycle for deploying, managing, and decommissioning each independent cluster. An insightful read for anyone responsible for the lifecycle of a reasonably complex monitoring stack.

DNSmetrics: Unified Metrics Collection From Multiple DNS Providers

PagerDuty open sourced DNSmetrics, a tool for gathering metrics from DNS providers and collecting them into a unified backend. The tool currently supports polling from Dyn and NS1 APIs, emitting them in a statsd format for easy ingestion into your monitoring solution of choice.

The future of Log4j input in Logstash

A recent security advisory revealed that attackers can exploit the way Apache Log4j deserializes objects. Although Logstash is not vulnerable to these attacks by default (the log4j input plugin comes disabled out-of-the-box), this highlights the inherent insecurity with object deserialization, especially among trusted peer systems. This has led Elastic to deprecate the log4j plugin, with the intent of removing it officially in Logstash 6.0.

Introducing Bolt: On Instance Diagnostic and Remediation Platform

Yet another interesting tool from the folks at Netflix. This time it’s Bolt, an add-on service to Winston. Bolt exposes custom instance-level actions as a REST endpoint, allowing for automated actions, such as self-healing.

Prometheus @CloudNativeCon+Kubecon Europe 2017 - videos

The full set of Prometheus-related talks from the recent CloudNativeCon+Kubecon Europe are now online. There’s some pretty interesting ones for you folks using Prometheus (or thinking about it). I particularly liked Alerting in Cloud Native Environments, but they’re all good, really.

Thanks for joining us, folks! If you like what you’ve seen, invite your friends and colleagues! As always, if you have interesting articles, news, events, or tools to share, send them our way by emailing us (just reply to this email).

See you next week!

– Mike (@mike_julian) & Jason (@obfuscurity) Monitoring Weekly curators