I linked to NetData a few weeks ago in the context of IoT monitoring and the author, Costa, reached out to show me that NetData is so much more. After an hour talking with him and getting a demo, he’s right: it really is so much more. I encourage you to check it out–it’s incredibly impressive and I’m already seeing potential use cases with my own clients.
My friend Thai Wood, an engineer specializing in reliability at Walmart Labs and former EMT, sees a problem: most of the academic research on resilience, incident management, and emergency triage is tucked away inside esoteric papers and non-technical industry knowledge. His new email newsletter aims to bring the lessons academia and other emergency fields has found into our realm, where we can put them into practice.
Every security engineer I know loves logs. Especially all the ones in /var/logs/ that most ops people tend to ignore. This article talks about those logs, what’s in them, and why you should care.
On the list of “Thing I didn’t know I could do,” there’s this: apparently, you can also have InfluxDB handle your logs. How about that.
Spoiler: lots of Logstash, Elasticsearch, and Prometheus. Secret ingredient? OpenResty.
I love it when a company is forthcoming about its challenges and efforts with reliability. I also really love it when they make their production dashboards public.
First off, I didn’t even realize there was a READONLY flag, so the “I have no idea what’s going on” feeling didn’t really get any better after the first paragraph. But if you stick with it, it shows how to use the performance_schema table to grab some metric data that isn’t typically reporting: read-only transactions. The end result can be displayed as a graph, of course.
I like this article, and not only because I’m quoted in it. Though, one particular point the article brings up: I really don’t like the term “informational alert” that I used in the Practical Monitoring book, but I can’t think of a term that better describes a message sent automatically for “review at your convenience” but doesn’t need to wake someone up. For example, high child process churn rate in supervisord or an instance in an ASG that keeps getting killed and respawned: good to know it’s happening so someone can look into it, but it’s not impacting customers, so it’s unnecessary to wake someone up. Anyone got a better term?
Do you run Cloud Foundry? Do you have trouble monitoring it? This article series should help. Note that it doesn’t apply to Pivotal Web Services (the host Cloud Foundry).
The local monitoring meetup group in Northern Virginia is meeting to talk about MySQL monitoring with the imitable Baron Schwartz speaking.
I’m speaking at OSMC in Nuremberg this November, which is gonna be super fun and super cold. Come on out if you’re in the area.
I’ve opened a job board for monitoring and observability jobs. If you’ve got some monitoring/observability roles you’re trying to fill, how about heading on over there? I’ll be including them here in the newsletter as well.
See you next week!
— Mike (@mike_julian) Monitoring Weekly Editor