Issue 155

Some great articles this week, with an emphasis on design patterns, reliability, and some new-to-me tools that I’m hoping to try out this week. Enjoy! ☕📈💖

Elastic logo

Elastic Stack 8.0 and 10 years of Elastic are coming to you

The next major release of the Elastic Stack is coming soon and it's also time to look back at 10 years of Elastic with the founders. Learn more about both topics and deep dives into observability this Friday, February 11, at ElasticCC, the free technical community conference from Elastic. Sign up today!

Articles & News on monitoring.love

Observability & Monitoring Community Slack

It’s been amazing to see the community grow throughout 2021 and into 2022. We’d love to have you join us and share what you’ve been working on.

From The Community

The Delivery Hero Reliability Manifesto

Reliability means something different to every company, but it’s critical to have a shared understanding of what that means. This manifesto from Delivery Hero is a fantastic example of how to drive consensus and set expectations among your engineering teams.

READS: Service Health Metrics

An insightful look at the bare minimum of metrics that service owners at Salesforce are expected to collect and monitor.

Design Patterns and Principles That Support Large Scale Systems

Scaling systems is the kind of challenge that most of us live for, but it takes experience to learn the pitfalls and patterns that save us time and money the next time around. It should be no surprise that so many of these considerations overlap with the observability domain.

What Is New with Periskop in 2022

Back in 2020, SoundCloud announced the release of Periskop, an exception handling service modeled after Prometheus’ pull model. They’ve posted an update detailing their progress with Periskop along with a list of planned features.

Pro tip: How to use semi-relative time ranges in Grafana

I’ve never heard of this before, but I kind of love it now. Feels like logarithmic scale for your X axis. ⌚🧙

Kiali: Manage, visualize, validate and troubleshoot your Service mesh!

How to set up the Kiali console for managing and gaining observability over your Istio service mesh.

Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop

How Uber leverages their observability data as part of a larger system to help identify fraudulent activity.

Timescale logo

The SQL-powered observability backend

Analyze Prometheus metrics and OpenTelemetry traces together using Promscale + the power of SQL. Promscale is open source and built on top of PostgreSQL/TimescaleDB. Get the system insights you need with the technology you’re familiar with. Learn more. (SPONSORED)

Importance of Good Incident Communication

A collection of best practices and principles for managing communications during and after an incident.

PostgreSQL WAL activities

If you’re a Postgres administrator (or work with one), you probably know how important it is to keep an eye on your WAL activities. Here are some really handy queries for trending the health of your database.

Taming cAdvisor’s high CPU usage

A quick fix for minimizing cAdvisor’s CPU impact on your clusters.

Tools

kiali/kiali

“Kiali provides answers to the questions: What microservices are part of my Istio service mesh and how are they connected?”

bookingcom/nanotube

“This is the router (or relay, or reverse-proxy) for Graphite. It routes incoming records according to the specified rules. The Nanotube is designed for high-load systems. It is used at Booking.com to route up to a million incoming records/sec on a single box with a typical production config.”

periskop-dev/periskop

“Pull based, language agnostic exception aggregator for microservice environments.”

Job Opportunities

Senior Site Reliability Engineer, ServiceOps at Wikimedia Foundation (Remote)

Site Reliability Engineer III at Wikimedia Foundation (Remote)

Cloud Backend Engineer at AllSpice (Remote)

Senior DevOps/Senior Site Reliability Engineer at Sana Benefits (Remote)

Hiring Cloud Solution Architect at Papa (US Remote)

Senior Infrastructure Engineer at IRL (Remote)

Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor