SPECIAL EDITION: Q1 2022 Best of
It’s time for another “best of” issue! We have some fantastic articles here covering the most popular topics and themes from the past few months. Enjoy!
This issue is sponsored by:
You might have heard discussions about the “three phases of observability.” But what do they really mean? Chronosphere is a SaaS cloud monitoring tool that helps teams rapidly navigate the three phases of observability. Learn more about Chronosphere and the three phases of observability here.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Who monitors the monitoring system?
A look at how HelloFresh implemented a Dead Man’s Switch on top of their Prometheus and Thanos stack.
Despite the title, this is a fairly deep-dive into eBPF internals, writing your own eBPF programs, its potential for observability and much, much more.
5 Dashboard Design Best Practices
Most teams I’ve worked with will slap a bunch of metrics and graphs together without really understanding how to use the data effectively. This is a thoughtful look at how to design a dashboard with your users in mind.
OpenTelemetry democratises access to observability data & will enable massive innovation
We see a lot of articles about OpenTelemetry, but this might be the most concise and helpful one I’ve read yet. Bookmark this one and share it with your peers who need to learn about OpenTelemetry.
Transforming remote JSON into Prometheus metrics
Did you know you could consume data from remote JSON APIs into Prometheus? I can think of a number of different use cases for this. Nice example.
5 key observability trends for 2022
An overview of the most common trends in observability right now. Jibes with everything I’ve seen in this newsletter over the past year.
An insightful look at the bare minimum of metrics that service owners at Salesforce are expected to collect and monitor.
I love this article from Brendan Gregg on why we do (or don’t) choose certain products. Frankly, it feels like the making of a great checklist for any new potential vendor.
Design Patterns and Principles That Support Large Scale Systems
Scaling systems is the kind of challenge that most of us live for, but it takes experience to learn the pitfalls and patterns that save us time and money the next time around. It should be no surprise that so many of these considerations overlap with the observability domain.
A look at some of the differences between SRE and DevOps principles, with a particular emphasis on service levels and monitoring signals.
The Delivery Hero Reliability Manifesto
Reliability means something different to every company, but it’s critical to have a shared understanding of what that means. This manifesto from Delivery Hero is a fantastic example of how to drive consensus and set expectations among your engineering teams.
5 Design Patterns for Building Observable Services
An excellent article from Salesforce engineering, covering their more popular design choices for building observable services.
Rapid Event Notification System at Netflix
Another fantastic article from Netflix engineers about building (and observing) systems at scale.
Saving on AWS Lambda Amazon CloudWatch Logs costs
A really clever way of buffering up debug logs in AWS Lambda to avoid blowing up your CloudWatch budget.
Exploring logging strategies with the Elastic Stack
Considerations for indexing your Elastic Stack logging services. There’s some good stuff in here, but it also reminds me why I happily paid the “Splunk tax” at my last gig.
Scaling Kafka Consumer for Billions of Events
PayPal engineers share their techniques for benchmarking Kafka and testing different failure scenarios before their services went to production.
How secure is your Grafana instance? What you need to know
A fairly exhaustive look at Grafana’s security features. Just note that most of its advanced capabilities are locked away in their commercial offerings.
Getting visibility into your container images
This article introduces a new (to me) tool that looks super helpful for creating an inventory of all the software versions running in a container. I know that you can sort of do this with Prometheus already, but a standalone tool for audits makes a lot of sense too.
How to monitor Starlink with Prometheus
I consider myself fortunate to live in a rural area with fiber internet. If you’re one of the lucky folks with access to Starlink, here’s a quick tutorial for monitoring your connection with Prometheus.
If you’ve been around here for a while, you know I’m highly opinionated about writing alerts that are useful and empathetic towards the engineers who answer them. I love hearing from others who are just as passionate and thoughtful about writing iterating on alerts.
I love these little weekend projects with dashboards and home automation (or in this case, home network monitoring).
Microservices Observability Design Patterns
So often we get hung up on the tooling and their limitations without really thinking about the problems we’re trying to provide solutions for. I love this collection of design patterns for building observability into our (micro)services.
Unpacking Observability: The Paradigm Shift from APM to Observability
How to think about Observability if your organization is stuck in an APM (or monitoring-only) mindset.
Events
Monitorama PDX 2022 - June 27-29 (Portland, OR)
Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
Job Opportunities
Site Reliability Engineer at Knock (US Remote)
Senior DevOps Engineer at Hive Collective (Remote)
DevOps Engineer at Amount Small Business (Remote)
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor