Issue 150

I hope everyone enjoyed their holidays (or at least had a quiet on-call rotation) as we tread gently into 2022. Lots of fun articles this week, especially if you geek out over the pillars of observability. Enjoy! 📈🎮

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

High Availability & Fault Tolerance for Monitoring Stack

A good primer for thinking about highly available Prometheus, Alertmanager, and Grafana. TBQH there’s a lot more to plan for when scaling a truly HA Prometheus environment, but this will get you started in the right direction.

Fluent Bit — Write custom output plugins using Golang

A nifty example for writing your own custom output plugins for the Fluent Bit log processor.

Observability in Chaos

One author’s take on why we don’t see more consolidation among open source tools. Personally, I think they might be conflating monitoring tools with the rest of the observability space, e.g. traces (which do a better job normalizing highly cardinal systems). Still an interesting read with some fresh takes on the space.

Why you should use CloudWatch Embedded Metric Format

An excellent case for using CloudWatch EMF for custom metrics rather than publishing them yourself to PutMetricData.

How We Measure Reliability

We seemingly talk about reliable systems every day, but investing in reliability often competes with enhancements for engineering resource and priority. Nice to hear from a company that treats reliability as an important “feature”.

Quick words on Vector

Great to see a community article on Vector. It started as a high-performance log router (written in Rust), but it appears to be evolving into something akin to an “observability pipeline”. Good to see Datadog’s acquisition of the Vector team hasn’t slowed them down.

Cost Reduction in Goku

As a time-series nerd, I can’t help but geek out when engineers share their notes (and math) when scaling TSDBs. Even though Goku isn’t open source, it makes for a great read if you work with these types of systems.

Monitor CSGO - Counter Strike: Global Offensive with Prometheus

I don’t play CSGO, but I’ve been looking for better ways of tracking performance on my personal desktop. Fortunately, there’s always the Minecraft exporter for us casual gamers.

Spans - a key concept of distributed tracing

A helpful overview of spans and their role in distributed tracing systems.

Thanos: Musings of a Mentee

The journey of one developer’s experience contributing to the Thanos project through the Linux Foundation’s mentorship program.

Tools

vectordotdev/vector

“A reliable, high-performance tool for building observability data pipelines.”

galexrt/srcds_exporter

“Prometheus exporter for SRCDS Gameserver using Source RCON.”

sladkoff/minecraft-prometheus-exporter

“A Bukkit plugin which exports minecraft server stats to Prometheus.”

Job Opportunities

DevOps Engineer at Munibilling (Remote)

Cloud Operations Engineer at HomeValet (Remote)

Senior Site Reliability Engineer - Monitoring/Observabity at Axon (Seattle, WA)

Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor