Issue 217

A fun, diverse collection of topics this week – from kernel debugging to trace sampling, and an opportunity to compare anomaly detection systems from two industry giants. Enjoy! 📍🌽⚡

This issue is sponsored by:

Armory logo

Armory is declarative Continuous Deployment orchestration to Kubernetes. Empower your developers with automated, multi-environment, and advanced progressive deployments so they can innovate faster, and not struggle to deploy code. Improve developer and customer experience, while reducing risk. Disrupt your market, not your developers.

Learn more.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

From Overwriting Secrets to Modifying Global Client: How Two Incidents Taught Me the Value of Rigorous Testing and Disaster Recovery and Why I’m Still Employed

This story is such a great reminder of what it feels like to be responsible for the health of a service, including the stress and panic when something goes sideways. Would you have done anything differently? How would you have supported this engineer as a peer or incident commander?

Monitoring Kubernetes Pods Resource Usage with Prometheus and Grafana

There’s something concise yet detailed about this mini guide that I really dig. Props to the author for calling out security concerns and for setting expectations accordingly.

Cost-effective tracing with sampling

How (and when) to use sampling in your distributed tracing.

Warden: Real Time Anomaly Detection at Pinterest

A fascinating look at Pinterest’s anomaly detection platform and the algorithm choices they’ve made in its design.

Debugging a FUSE deadlock in the Linux kernel

A reminder that kernel code is not to be trusted and that we’re all basically doomed.

Building a large scale unsupervised model anomaly detection system

I’m including this one primarily as an interesting comparison to Pinterest’s approach. Both seem equally suited to their respective problem domain.

Elevating Kubernetes Network Monitoring with Istio, Linkerd, and Envoy

Although this article is very high-level, it provides numerous jumping off points catered to the Kubernetes service mesh you’ve running.

Remembering the important bits to log

Another acronym to help remind you which ~~metrics~~ logs to collect. (Note: probably easier to remember as “TIRED”, tbqh)

The Single Pain of Glass

A hot take on dashboards. Sorta. I don’t really get the argument that single pane dashboards are good or bad. Any dashboard is only as good as the effort you put into it to make it answer the questions that are relevant to your needs.

Centralizing Cloudwatch observability - Past, Present and Future

A perspective look at the evolution of CloudWatch monitoring across accounts.

Tools

yahoo/egads

“EGADS (Extensible Generic Anomaly Detection System) is an open-source Java package to automatically detect anomalies in large scale time-series data.”

Events

Monitorama 2023 PDX

Just five weeks left until everyone’s favorite monitoring conference of the year. I’m super excited to see the new speakers and to hear what everyone has been up to since the conference returned to Portland in 2022. Hope to see you there!

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor