Whew, what a week. Just when you thought things were slowing down for the holidays, numerous security exploits hit close to home. No worries, we’ve got some upbeat news too… SysAdvent is back, a new Python-esque debugger from Facebook, and a handful of production troubleshooting stories. Enjoy!

This issue is sponsored by:

LogicMonitor logo

Work. Without the hard work.

LogicMonitor empowers teams to spend less time troubleshooting and more time innovating with fully automated infrastructure monitoring and log analysis. AI-powered intelligence automatically detects monitoring resources, surfaces anomalies, and provides root cause analysis across your entire stack. Leave the manual configuration, expensive hardware, and long hours of troubleshooting behind with a free trial of LogicMonitor.



Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Troubleshooting troubles

Observability tools and data are good to have, but how to apply them effectively? A thorough collection of tips and considerations when troubleshooting systems.

Microservice Monitoring with Grafana & Prometheus

Monitoring your Spring microservices with Prometheus and Grafana.

What made SLOs so messy (and what we can do about it)

SLOs are one of those things that seem easy, but so few companies seem to use the effectively or consistently. Here are some examples why we’re doing it wrong.

drgn: How the Linux Kernel Team at Meta Debugs the Kernel at Scale

How Facebook troubleshoots Linux kernels and userspace applications using the drgn debugger. Nice to see that the tool was written with scripting capabilities (specifically Python) in mind.

SysAdvent: Day 2 - Reliability as a Product Feature

I’m thrilled to see SysAdvent return this month. This post makes the case for reliability being a first-class product feature. Might be a good resource to share with your company’s Product leadership.

From Monitoring to Observability in Snapp!

How Snapp evolved from using monitoring tools to adopting an observability mindset and processes.

Pinterest Druid Holiday Load Testing

I love hearing how engineering teams think about scale and latency during the holiday seasons. Even if you’re not using Druid, there are some great takeaways for capacity planning in general.

Raygun logo

The complete guide to error monitoring and crash reporting

Software bugs are frustrating for everyone. End users lose patience and leave, developers struggle to reproduce errors, and businesses lose customers without even knowing why. Learn why modern development teams need error monitoring more than ever. Read the guide. (SPONSORED)



Measuring Web Performance at Airbnb

How Airbnb engineers squeeze every last second of performance out of their web load times.

Grafana security fixes

This was a busy week for the Grafana team and community, with multiple security vulnerabilities and fixes released. I’ve compiled all of the relevant posts for you here. Please update your affected systems immediately, if you haven’t already.

Critical vulnerability in Apache Log4j library

Speaking of security vulnerabilities, this one was a doozy. In case you missed the gloom and doom (and memes), a critical vulnerability affecting the Apache Log4j library was discovered. There are remote code execution exploits already in the wild. 😱

Decorate the Python function

Here’s a great resource for adding observability to your Python Lambdas. I’ve listed the three most recent posts, but the entire series is worth your time.

Tools

osandov/drgn

drgn (pronounced “dragon”) is a debugger with an emphasis on programmability. drgn exposes the types and variables in a program for easy, expressive scripting in Python.

Job Opportunities

Senior Site Reliability Engineer - Observability at Alteryx (Remote)

Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor