A ton of variety this week, with everything from high-level observability and incident management discussions to the introduction of a new OSS network monitoring tool. And all of the videos are now available from Monitorama… enjoy!
This issue is sponsored by:
Sysdig Monitor is making it easier to find important details about your clusters, namespaces, and deployments with a new feature called Advisor. In this webinar, you will learn how Advisor can help you debug and solve difficult Kubernetes problems. Join us at 10am PT on Tuesday, July 26th to add this feature to your troubleshooting toolbox. Save your seat here.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
The videos from last month’s Monitorama event were uploaded this week. There were so many great speakers, but I would make sure you don’t miss out on Sophia Russell and Adrian Cockcroft’s talks from the first day.
A quick example for integrating Kubernetes APIServer latency monitoring into your toolbox.
This post touches on many of the aspects of alert design that we feel but perhaps don’t think about as much as we should. I wish it talked more about the impact of bad alerts on operators, but it’s still a great article.
As someone who got introduced to monitoring concepts and tools through network administration, I was excited to see this tweet about a new Netflow/IPFIX collector based on projects like Kafka and ClickHouse. I’ll be watching this one with a keen eye.
Why Razorpay engineers adopted the “5-Why” root cause investigative technique, and then adapted it based on their own learnings, allowing them to capture additional context and driving to better analysis.
A very through tutorial for setting up Logstash and Fluentd in your Kubernetes pods with the sidecar pattern. Well done.
Promscale is a durable and scalable Postgres-based storage backend for Jaeger that is much easier to set up and operate than Elasticsearch or Cassandra. It includes out-of-the-box dashboards and full SQL query capabilities to understand the performance and behaviors of your services. (SPONSORED)
If you’re looking to apply some automated anomaly detection to your data, here’s a simple introduction to leveraging Azure’s Anomaly Detector service.
A broad look at Observability, how to distinguish it from Monitoring, some practical examples, and a number of high-level best practices to consider before starting your own observability journey. Share this one with your CIO.
Fixes for a high security CVE affecting Grafana alerting and OAuth capabilities were released this week.
Considerations for monitoring your systems on a distributed and/or global scale. This article doesn’t address specific tools, but it covers some of the important decisions to make before you eventually pick one.
“OpenDCDiag is an open-source project designed to identify defects and bugs in CPUs. It consists of a set of tests built around a sophisticated CPU testing framework.”
“This program receives flows (currently Netflow/IPFIX), hydrates them with interface names (using SNMP), geo information (using MaxMind), and exports them to Kafka, then ClickHouse. It also exposes a web interface to browse the collected data.”
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor