Issue 187

This week was loaded with insightful posts on observability culture, logging challenges, and unique monitoring patterns and best practices. Love it! 😍☕🍂

This issue is sponsored by:

Chronosphere logo

Chronosphere recently teamed up with Prometheus Co-founder, Julius Volz, to talk about the potential pitfalls of Prometheus-based monitoring, best practices to avoid getting burned, and how to get the most from your cloud native monitoring. In case you missed it, catch the webinar on demand here.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Reducing Logging Cost by Two Orders of Magnitude using CLP

A technical deep-dive on Uber’s log management challenges and how they improved their retention and compression.

The importance of observability

I think everyone agrees on the importance of observability, but we approach it differently according to our perceived needs and challenges. Always insightful to hear how others tackle this domain.

Observability strategies to not overload engineering teams

Speaking of adopting observability, it’s easy to be overwhelmed by the landscape of tooling and philosophies. This series aims to simplify and make observability more approachable for your engineering team(s).

Effortless Alerting for Platforms and Their Tenants

An excellent post from NYT engineers on the challenge of developing best practices around monitoring and alerting for a multi-tenant platform.

Observability is Cultural

A tempered but solid take on the state of observability and how much is too much to rely on any specific pillar or tool.

Proactive monitoring: The why, what and how

Interesting to hear from the engineering side of a massive brick-and-mortar enterprise like McDonald’s, particularly on their use of proactive monitoring techniques to help prevent issues in restaurants (though I wish they would have provided more details there).

Kubernetes ErrImagePull and ImagePullBackOff in detail

I love Sysdig’s posts on Kubernetes state metrics. Almost makes me ok with having to run Kubernetes. 😅

How to stay one step ahead of errors and downtime as you scale up your business

How FINN views the challenges of scaling your monitoring and reliability alongside your business.

Monitoring with Prometheus

The first two parts of a continuing series on monitoring with Prometheus. If you’ve used Prometheus before, much of this may feel like a refresher, but it’s a solid guide for anyone new or inexperienced with this system.

Monitor Postgres Performance Using Custom Queries

A different way to run custom queries in Postgres and surface the responses as metrics in your ELK stack.

Events

Monitorama PDX 2023

Last week we announced dates and the opening of Early Bird ticket sales for Monitorama PDX 2023. Next year marks the 10-year anniversary of the Monitorama conference series, dating back to the original Boston event in 2013. Hope to see you there!

Job Opportunities

Software Engineer (SRE) at Let’s Encrypt (US Remote)

DevOps Engineer at EnergyHub (US Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor