Issue 123

This issue is SLO heavy with a fair bit of performance engineering and instrumentation topics. In short, everything I love to read about and nothing I’m particularly good at.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

3 considerations from building a platform for Observability

Any effective observability platform should exist for the benefit of its customers. A few high-level considerations to keep in mind when beginning your observability journey.

Prometheus, but bigger

Frankly, I never get tired of seeing companies switch between build versus buy (and back again). No matter which “team” you’re on, it’s always educational to hear that it’s possible (and cost-effective) to make that pivot. Another win for Thanos.

SLOs should be easy, say hi to Sloth

Sloth generates SLOs easily for Prometheus based on a spec/manifest that scales. Is easy to understand and maintain.

For as much as folks talk about SLOs, I haven’t seen a lot of standardization in how we document them, communicate them, etc. I’m very excited to see a project like Sloth surface, and I hope it continues to mature. Interestingly, this is how I first heard of the OpenSLO specification.

Serverless Diary: 3 Expert tips to designing distributed logging system

If you’re already using serverless (or considering it), this is a great primer on how to use distributed logging effectively for your services.

Discover how we set up SLOs at BlaBlaCar

This might be the best article I’ve read about SLOs all year. I love hearing when teams communicate and listen to one another while defining their SLOs and learning from their mistakes.

Automower™ Mapping in Grafana

A fun look at tracking a robotic lawn mower in Grafana using a DIY python exporter.

5 Takeaways from talking about performance engineering with Taras Tsugrii from Facebook

This article summarizes some of the more interesting takeaways around performance, debugging, and optimizations from an interview with one of Facebook’s performance engineers. If you have time, I strongly encourage you to check out the full podcast.

Grafana dashboards for pgSCV

A follow-up to the recent release of pgSCV, the developer has released some Grafana dashboards to support many of the metrics collected by the exporter.

Setting up Service Monitoring

This post dovetails nicely with the other SLO articles this week. Beyond the golden signals, what else should you be monitoring? Quite a bit, as it turns out.

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

This is a “chonky boi” of a technical article, but there’s so much good stuff in here I simply had to include it. Talk about squeezing every last drop of performance out of a system. And I loooove the inclusion of flame graphs.

Automatic Instrumentation of a Go application using Opentelemetry

I don’t know that I agree with the author’s assertion of “automatic” here, but this looks like a useful instrumentation pattern to share with your fellow Gophers.

Jenkins agent monitoring with Prometheus

A solid introduction to monitoring your Jenkins agents with Prometheus, including some metrics that you’re unable to get with a traditional node_exporter setup.

Tools

OpenSLO

OpenSLO is a service level objective (SLO) language that declaratively defines reliability and performance targets using a simple YAML specification.

Ok, this isn’t technically a tool, but I love that some folks have finally gotten together to formalize a specification for SLOs. Even the Sloth author is aiming to comply with the OpenSLO specification. I’m definitely keeping an eye on this initiative.

Events

Monitorama PDX 2021 - September 13-15 (Portland, OR)

One of the first technical conferences to resume in-person events, Monitorama is returning to Portland, OR this fall. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!

Job Opportunities

Lead DevOps / SRE at Loomly (Remote)

DevOps Engineer at CareMessage (Remote)

Site Reliability Engineer at Flashpoint (Remote)

Site Reliability Engineer at Skillshare (Remote)

Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor