Issue 157

Some really great stories this week on distributed tracing, eBPF, and scaling networks and services. Enjoy! 😺

Chronosphere logo

What should we expect for the observability space in 2022? Chronosphere’s Co-founder and CEO Martin Mao breaks down his top 3 predictions for what’s to come in observability over the next year. Read Martin’s predictions about open standards adoption, centralization of observability practices, and data growth here!

Articles & News on monitoring.love

Observability & Monitoring Community Slack

It’s been amazing to see the community grow throughout 2021 and into 2022. We’d love to have you join us and share what you’ve been working on.

From The Community

Get Started with eBPF

Despite the title, this is a fairly deep-dive into eBPF internals, writing your own eBPF programs, its potential for observability and much, much more.

Who monitors the monitoring system?

A look at how HelloFresh implemented a Dead Man’s Switch on top of their Prometheus and Thanos stack.

OpenTelemetry Collection: High availability deployment patterns while using the load-balancing exporter

Patterns for supporting high availability in an OpenTelemetry tracing ingestion pipeline.

Rapid Event Notification System at Netflix

Another fantastic article from Netflix engineers about building (and observing) systems at scale.

Why OpenTelemetry Should Matter to Network and Systems Admins

Companies like Boundary (RIP Boundaq) were built around the idea that network observability equates to service observability. I think this is something to that effect, though I’m not sure the author goes far enough to make a case for OpenTelemetry here.

Grafana 8.4 release

Yet another Grafana release, this time with improved accessibility, better query performance, new panel options, and enhanced alerting features.

Distributed Transaction Tracing with Transaction Ids in JVM Services

ClassPath engineers share a look at how they’ve adopted distributed tracing across a variety of JVM services. Feels a little DIY, but includes some interesting ideas and patterns.

Do tech companies need an Incident Management Department?

For most companies, I agree the answer is probably still “no”. The IM tooling and automation market is blowing up with plenty of options so you don’t have to do this yourself.

What are cardinality spikes and why do they matter?

A quick primer on cardinality and how it impacts the way we instrument and collect monitoring data.

My Grafana Dashboard

I love these little weekend projects with dashboards and home automation (or in this case, home network monitoring).

Observability: the basics

A friendly overview of Observability and why it matters even more with complex architectures built on microservices and containerized systems.

Events

Monitorama PDX 2022 - June 27-29 (Portland, OR)

Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!

Job Opportunities

Customer Reliability Engineer (K8s) at Replicated (Remote)

Site Reliability Engineer at Replicated (Remote)

Senior Site Reliability Engineer (AWS) at Mandiant (US Remote)

Site Reliability Engineer at Honeycomb (NA Remote)

Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor