A little bit of everything this week. I hope everyone’s doing well and you’re able to take some time off this summer, especially as vaccines steadily become more available worldwide. Enjoy this week’s collection!

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. This past week there were some fascinating discussions around xbuilding monitoring infrastructure from scratch, SLOs, SLIs, and error management practices. Join our Slack so you don’t miss out. 😃

From The Community

Hacking your way to Observability — Part 3

The next part of a series on open source observability, this one pivots from Prometheus metrics to tracing with Jaeger and OpenTelemetry. Another great write-up with comprehensive code examples, diagrams, and configuration snippets.

“THANOS” — Monitoring with Prometheus and Grafana

If you haven’t tried Thanos yet, this is a solid introduction to all the components and a walkthrough for setting it up yourself. Props to the author for including the relevant configurations and command-line steps, along with screenshots from the various UI elements.

How Etsy Prepared for Historic Volumes of Holiday Traffic in 2020

This story is a few months old, but I can’t resist sharing stories from the Etsy engineering blog. Great to see them sharing their experience managing the explosive growth through last year’s holidays due to pandemic lockdowns.

How to Serve 200K Samples per Second with Single Prometheus

Lots of emphasis on Thanos’ long-term retention capabilities. This is one of the things that drove our own adoption of it at $DAYJOB, where it continues to serve us well.

The SLAyer your data pipeline needs

The tool is nice, but I had to include this one just for the application name. Well played.

How We Replaced Splunk at 100TB Scale in 120 Days

I get a lot of value out of Splunk, but you better believe I’ll be looking for alternatives if a vendor acts like they have me locked-in. Even at 100TB of daily logging ingestion, this team managed to plan and execute their way into an open source alternative stack. But my favorite part of this article is how they’ve documented the processes for a successful transition. You should be able to take these steps and apply them to any software transition.

Psychological safety in a software team

If you read nothing else this week, read this one and share it with your peers. You may not agree with all of their points on deployments and incidents, but the discussion on psychological safety at work is an important one.

Sensu to be acquired by Sumo Logic

Having worked at Sensu during the transition from Ruby to Go, I know it was a lot of work for their team to get to this point. I couldn’t be happier for my friends as they take on their next challenge.

Closing the Loop on Testing Network Changes

Maybe I’ve been out of network administration too long, but I had no idea this kind of stuff was even possible. It kind of blows my mind that you’re able to construct elaborate test frameworks around network configuration changes.

Monitoring Linux systems with SNMP extend method

Speaking of networks, it’s nice to see that folks (who aren’t me) are still using SNMP. Much of this article is written with a focus on Zabbix monitoring, but the SNMP bits could be helpful for other systems as well.

Tools

redis-hawk: granular Redis monitoring

redis-hawk is an open-source monitoring platform designed to help engineers lift the hood and look directly at how data is flowing in a given Redis deployment.

I’m not really sure why you’d choose a domain-specific tool like this over a Redis exporter for Prometheus, but if that sounds like you, this project has you covered.

OpenHistogram – Open source log-linear histograms

Vendor-neutral log-linear histograms for the compression, mergeability, and analysis of telemetry data

This project was announced a few months ago, but it’s worth mentioning here. Nice to see that they’ve released implentations for Go, C# .NET, and Javascript as well.

pgSCV — metrics exporter for PostgreSQL

A new monitoring agent and Prometheus exporter for PostgreSQL and related services. They have a full list of supported collectors here.

**[KubeView Kubernetes cluster visualiser and graphical explorer](http://kubeview.benco.io/)**

I can see this being a very useful open source alternative for visulizing your Kubernetes clusters. They even have a demo video you can check out.

Events

SREcon21 Call for Participation - Proposals due June 30, 2021

SREcon is back again for another virtual event in 2021. Proposal submissions are due by June 30, check out their CFP page for full details.

Monitorama PDX 2021 - September 13-15 (Portland, OR)

One of the first technical conferences to resume in-person events, Monitorama is returning to Portland, OR this fall. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!

Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor