Did you know I have a podcast too? Check it out: Real World Devops

This issue is sponsored by:

VictorOps logoBuild the Resilient Future Faster: Creating a Culture of Reliability eBook

It’s a story of how VictorOps builds a DevOps culture centered around accountability and collaboration to build more reliable services and bolster SRE efforts. Written by Jason Hand, it dives into both the technical aspects of monitoring, observability, alerting, etc., as well as cultural aspects of collaboration, workflow transparency, etc.

Latest on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

SRE Observability: Metric Namespaces and Structures

Namespacing metrics is hard, yo.

DEV Community Is Open-Sourcing Its Performance Metrics via Skylight

Including this because it introduced me to a new tool: Skylight.

How do you measure the success of a rock concert?

This one has absolutely nothing to do with Ops/Engineering…or does it? I’ve been refreshing my knowledge on SLIs and SLOs lately, while also re-reading How To Measure Anything, and something has stuck out to me this time: an SLI is directly correlated to how happy your users are, but how do you measure user happiness? As in the example in this article, what you really want are some leading indicators, not lagging indicators. “Revenue earned” is a lagging indicator, for example, while “shopping carts abandoned” is a good leading indicator.

Logs, metrics, and the evolution of observability at Coinbase

Spoiler: ELK, Datadog, and a bunch of glue. Perhaps most interesting here is that they’re using AWS’s Elasticsearch for some areas with success.

fastly/sidekiq-prometheus: Prometheus instrumentation for Sidekiq

Two awesome tools, now working together.

**[Logflare Tail -f Cloudflare Logs](https://logflare.app/)**

From the site, “Because Cloudflare doesn’t give you logs unless you’re on an Enterprise plan.” I can’t help but wonder how long until Cloudflare shuts this capability down. In the meantime: neat!

Introduction to HAProxy Logging

One thing I love about HAProxy is their understatedness. This “introduction” is more in-depth than many “deep dives” I’ve read.

Performance monitoring with OpenTracing, OpenCensus, and OpenMetrics

If you were starting to get confused by the silly ‘OpenWhatever’ naming patterns, this article from the folks at Datadog does a great job of explaining what the hell is going on.

When Holt-Winters Is Better Than Machine Learning

The math proof is even included at the end. 💥

Loki - Prometheus For Logs - FOSDEM 2019 (video)

VP of Product at Grafana, Tom Wilkie, gave a great talk about their new logging tool, Loki, at FOSDEM 2019 recently.

Postmortems - Part 2: How to Adopt a Learning Culture

Following up on their announcement of the postmortem documentation, PagerDuty talks more about how to adopt a learning culture in your org–a crucial, but often-overlooked aspect of improving your portmortem process.

This issue is sponsored by:

Blue Medora logoMonitoring on-prem infrastructure with…StackDriver?!

Yes, it’s a thing! Blue Medora helps you integrate your on-prem infrastructure and your cloud infrastructure into one place. Rather than making your users learn yet another monitoring tool, Blue Medora acts as a bridge, transparently shipping metrics from your datacenter hardware to monitoring tools of your choice.


Spring Monitoring Meetup - March 6, 2019 - London, UK

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor