Issue 217
A fun, diverse collection of topics this week – from kernel debugging to trace sampling, and an opportunity to compare anomaly detection systems from two industry giants. Enjoy! 📍🌽⚡
This issue is sponsored by:
Armory is declarative Continuous Deployment orchestration to Kubernetes. Empower your developers with automated, multi-environment, and advanced progressive deployments so they can innovate faster, and not struggle to deploy code. Improve developer and customer experience, while reducing risk. Disrupt your market, not your developers.
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
This story is such a great reminder of what it feels like to be responsible for the health of a service, including the stress and panic when something goes sideways. Would you have done anything differently? How would you have supported this engineer as a peer or incident commander?
Monitoring Kubernetes Pods Resource Usage with Prometheus and Grafana
There’s something concise yet detailed about this mini guide that I really dig. Props to the author for calling out security concerns and for setting expectations accordingly.
Cost-effective tracing with sampling
How (and when) to use sampling in your distributed tracing.
Warden: Real Time Anomaly Detection at Pinterest
A fascinating look at Pinterest’s anomaly detection platform and the algorithm choices they’ve made in its design.
Debugging a FUSE deadlock in the Linux kernel
A reminder that kernel code is not to be trusted and that we’re all basically doomed.
Building a large scale unsupervised model anomaly detection system
I’m including this one primarily as an interesting comparison to Pinterest’s approach. Both seem equally suited to their respective problem domain.
Elevating Kubernetes Network Monitoring with Istio, Linkerd, and Envoy
Although this article is very high-level, it provides numerous jumping off points catered to the Kubernetes service mesh you’ve running.
Remembering the important bits to log
Another acronym to help remind you which metrics logs to collect. (Note: probably easier to remember as “TIRED”, tbqh)
A hot take on dashboards. Sorta. I don’t really get the argument that single pane dashboards are good or bad. Any dashboard is only as good as the effort you put into it to make it answer the questions that are relevant to your needs.
Centralizing Cloudwatch observability - Past, Present and Future
A perspective look at the evolution of CloudWatch monitoring across accounts.
Tools
“EGADS (Extensible Generic Anomaly Detection System) is an open-source Java package to automatically detect anomalies in large scale time-series data.”
Events
Just five weeks left until everyone’s favorite monitoring conference of the year. I’m super excited to see the new speakers and to hear what everyone has been up to since the conference returned to Portland in 2022. Hope to see you there!
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor