A really fun week of articles, with an emphasis on tracing, PostgreSQL monitoring, and Elasticsearch. Great stuff all around. Enjoy! 🚢💾🐰
This issue is sponsored by:
Engineers are spending an average of 10 hours per week troubleshooting
Yes, you read that right. 10 hours. Check out Chronosphere’s new 2023 Cloud Native Observability Report that surveyed 500 engineers and software developers who weighed in on ways cloud native complexity makes their jobs harder and the hours longer. Read the findings here.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Love this story from a Flipkart engineer talking about their adoption of distributed tracing to help with observability of their inventory propagation pipeline.
This isn’t specifically about monitoring or observability, but so much of this potentially affects how we build and manage the resilience of our systems. Consume this and share with your peers.
This sort-of reads like someone’s notes as they master Elasticsearch internals over the course of a year, but I honestly couldn’t put it down. This post is super rich on useful details for anyone who admins Elasticsearch. Heck, most of this information is helpful even for users who just want to understand how its search internals work.
This article touches on the importance of predictable logs, attempting to make a case for structured logs (which imho are a no-brainer), before settling into a pretty good demo using the ELK stack with a Spring Boot application.
There are three certainties in life - death, taxes, and custom metrics eating up your Datadog budget. This author crafted a quick Python script that will help you determine which of your metrics are (or aren’t) in use, and can therefore be purged from Datadog.
A reminder to consider your observability needs and approach as developers continue to adopt and grow their serverless footprint.
Observability built to drive IT agility and productivity
SolarWinds Hybrid Cloud Observability is a comprehensive, integrated, and full-stack observability solution designed to integrate data from across the IT ecosystem, including network, servers, applications, data, and more. Try it for free today and take your observability to the next level. (SPONSORED)
If you’re looking to monitoring your PostgreSQL databases with pgwatch2, but already use Prometheus for the rest of your infrastructure, this guide offers a helpful path for getting the two working together seamlessly.
Speaking of PostgreSQL monitoring, we all know someone (ahem) who archives their databases but forgets to monitor for failures. Here’s a brief but excellent example that you could have running in no time.
Feels like we’re always reading about how to scale Elasticsearch, but here’s an interesting collection of failure conditions and how to debug them. Great stuff here, and yes, I have encountered some of these myself. 🤦♂️
For all of the OCI Windows admins in our audience (all three of you), I found this helpful post explaining how to get your System Monitor (Sysmon) events into OCI Logging Analytics using the OCI Management Agent. You’re welcome. 😅
“Flexible self-contained PostgreSQL metrics monitoring/dashboarding solution.”
“Sysmon… is a Windows system service and device driver that, once installed on a system, remains resident across system reboots to monitor and log system activity to the Windows event log.”
There are only a few weeks left to submit talks for this year’s Monitorama PDX 2023 event. Get your proposal in before the Feb 3 deadline! 🤩
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor