Issue 224

Some fun posts this week, with an emphasis on best practices and logging. Oh, and Monitorama videos! Enjoy! 🌴🪓🏓

This issue is sponsored by:

Armory logo

Can you rely on your deployments?

In a recent Armory and Gartner report, 35% of respondents’ top pain point with app deployment is reliability and consistency. If you need help with consistent, reliable deployments, try Armory Continuous Deployment-as-a-Service. Check out more in the reports here.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

You’re Paying too much for (Cloudwatch) Logs

This post speaks the trade-offs we face with our technology choices. More specifically, it compares logging costs between Cloudwatch, Datadog, and a “custom” solution using AWS components.

Into the Heart of Darkness: Sofia’s Adventure in the Enigmatic Logging Forest

I genuinely can’t tell if this is fan faction, developer advocacy, or an SRE biopic. Either way, it’s an interesting read.

Videos from Monitorama PDX 2023

Even if you missed the IRL event (or live stream), you can calm your FOMO now that the videos are uploaded for everyone to enjoy.

Prometheus and Thanos: An Ultimate Alliance for Scalable Metrics

An overview of Thanos, its components, and how it complements Prometheus when horizontal scaling becomes a necessity.

Best practices for avoiding race conditions in inhibition rules

Inhibit rules are an important aspect of alerting but can have unexpected behavior if you don’t fully understand how to configure them properly. If you’re alerting with Prometheus and Alertmanager, you should definitely read this post.

Streamline Network Observability on AKS

I don’t run any Kubernetes workloads on Azure, but if you are, this looks like something to consider adopting. On a related note, it’s always nice to see vendors leverage existing cloud-native standards (in this case, eBPF and Prometheus).

Best Practices for Monitoring Static Web Applications

This post from Datadog is going to try and pitch you on their service, but if you can filter through that it’s still a very detailed look at monitoring various aspects of your static content site or SPA.

Kubernetes logging best practices

Honestly, the title says it all. Although most of the best practices apply to logging in general, it’s still a good review for anyone using or maintaining logging infrastructure in Kubernetes.

How Monitorama changed our lives — a decade on

A reflection on the early days of monitoring and whether anomaly detection has really gotten us anywhere (my words, not theirs).

Service Level Objectives made easy with Sloth and Pyrra

Comparing two different SLO management tools. I love that each of them looks like a solid choice depending on the use case and your own unique needs.

Grafana alert state history: What’s new and improved in Grafana 10

Although Grafana Alerting is still relatively new, they discovered early on that its state history system was fairly limited. With the release of Grafana 10, and by leveraging Grafana Loki as the new storage backend, they were able to provide more effective filtering and custom analysis over a wider retention window.

Tools

pyrra-dev/pyrra

“Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!”

slok/sloth

“Sloth generates understandable, uniform and reliable Prometheus SLOs for any kind of service.”

Job Opportunities

TLM, Site Reliability Engineering and Critical Applications at Aurora

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor