My apologies for the super late issue this week! This week has me moving from San Francisco to Portland, OR, so it’s been a hectic week. Back to our normal schedule for next week. :)

This issue is sponsored by:

GitPrime logo📈 Data-Driven Guide to Engineering Leadership

Ship faster because you know more, not because you’re rushing. Get actionable insights from 7 million commits and 85,000+ software engineers, to increase your team’s velocity. Free Guide

Latest Articles on

Real World DevOps: Observability in Mega-Scale Banking with Greg Parker

Ever thought hard about your company’s observability strategy and the challenges you’re facing? What about if your company spanned 70 countries, 90,000+ employees, and you were a bank? My guest certainly thinks about this regularly. In this episode, I speak with Greg Parker, the head of the Enterprise Monitoring Services team at Standard Chartered Bank about what it takes to design and implement a global monitoring strategy in a complex environment.

From The Community

Scaling Graphite to Millions of Metrics

From the article: “Currently our stack reliably handles over a million active metric keys at any given time across 17 million total metric keys.” Very nice.

Grafana v6.0 Released

The sheer amount of cool stuff in here is overwhelming. Huge props to the Grafana team for this release–well done!

What is a Good Metric?

Not all metrics are infrastructure-related or deep in the code–many of the most important ones are higher-level. This article talks about what makes a good (business-level) metric.

Lighthouse & AWS Lambda: parallel web perf testing on a budget

There historically hasn’t really been a great way to test web page performance internally, at-scale, and with solid performance. I love this soluton.

Security Newsletter

There’s a lot of newsletters I follow, and Dieter’s Security Newsletter is one of the great ones. Highly recommended.

Resilience Engineering and Error Budgets

From the article: “I’m not a fan of error budgets. I’ve never seen them implemented particularly well up close, though I know lots of folks who say it works for them. I’m not ready to declare bankruptcy on the practice, though I’d like to highlight some of my concerns with respect to human factors, safety, and resilience engineering.”

Monitoring AWS ECS: Part 1, Part 2

The folks at Datadog are back with another great series on monitoring AWS Elastic Container Service.

Extending Vector with eBPF to inspect host and container performance

Netflix’s on-host performance analysis tool gets some neat updates.

Sunsetting Bosun at Stack Overflow and Call for New Bosun Maintainer(s)

The wonderful folks at StackExchange are sunsetting Bosun internally, so they’re looking for a new maintainer.

This issue is sponsored by:

SignalFx logoMonitor What Matters Most and Diagnose Anomalies in a Matter of Seconds

When it’s time to troubleshoot an issue, are you providing the right monitoring signals to your team? SignalFx APM helps by providing full distributed tracing, anomaly detection, and predictive analytics – all right out of the box.


Dash 2019 | Call for Proposal

The CFP is now open for Datadog’s DashCon.

Spring Monitoring Meetup - March 6, 2019 - London, UK

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor