I love issues like this one that skew heavy on the technical side, with debugging and hands-on guides. BTW if you ever run across something interesting that I’ve missed, please reach out and let me know! 📈💾👷♀️
This issue is sponsored by:
Can you operate observability data at scale? Have you optimized for speed and performance? While cloud native is the modern architecture of choice, it can slow down your DevOps teams. In this ebook, learn 5 steps to align DevOps and cloud native operations to boost developer productivity.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
An excellent diagnostic post from one of Alertmanager’s contributors. I genuinely love reading this author’s explanations on Alertmanager internals because they do a great job providing examples and sharing their thought process.
In the age of cloud and distributed systems, tags are a requisite for managing all of our disparate resources, particularly when it comes to observability. A broader look at the benefits of tagging versus metrics, though I wish the author would’ve touched on cardinality concerns.
Speaking of cardinality, here are some tips and examples for tracking down and mitigating the sprawl of high-cardinality metrics in your Prometheus cluster. Love this.
Feels like we’re on a roll here… this one’s for those Datadog users with a glut of custom metrics weighing down your monthly invoice (oh wait, that’s everyone). 😜
If you weren’t already aware, the OpenTelemetry project has their own official channel on YouTube. Their videos are generally rich in good information, but in particular I love this Q&A-style interview with Hazel Weakly where she describes the challenges and strategies for rolling out OpenTelemetry within an organization.
Incident management platform FireHydrant is building an alerting product, which will mark the first time alerting and incident response is offered in one platform. Sign up for early access to Signals by FireHydrant, and be among the first to experience the power of alerting + incident response together — at last. (SPONSORED)
Some useful networking cli utilities that can aide with debugging a monitoring alert or even serve as the basis for a health check.
Datadog is a popular commercial offering because of its breadth of coverage and capabilities. This post demonstrates a quick way to automate your Datadog monitors for a Kubernetes cluster.
A C-suite worthy analysis of the Observability landscape. If you’re trying to make a case with your leadership for dedicated resources, this could be a good article to share with them.
A very comprehensive guide for monitoring your SQS queues with Datadog.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor