Issue #081

From The Community

NetData: Get control of your Linux Servers. Simple. Effective. Awesome.

I linked to NetData a few weeks ago in the context of IoT monitoring and the author, Costa, reached out to show me that NetData is so much more. After an hour talking with him and getting a demo, he’s right: it really is so much more. I encourage you to check it out–it’s incredibly impressive and I’m already seeing potential use cases with my own clients.

Resilience Weekly

My friend Thai Wood, an engineer specializing in reliability at Walmart Labs and former EMT, sees a problem: most of the academic research on resilience, incident management, and emergency triage is tucked away inside esoteric papers and non-technical industry knowledge. His new email newsletter aims to bring the lessons academia and other emergency fields has found into our realm, where we can put them into practice.

Using Audit Logs for Security and Compliance

Every security engineer I know loves logs. Especially all the ones in /var/logs/ that most ops people tend to ignore. This article talks about those logs, what’s in them, and why you should care.

Writing Logs Directly to InfluxDB

On the list of “Thing I didn’t know I could do,” there’s this: apparently, you can also have InfluxDB handle your logs. How about that.

Designing a Metrics Pipeline for SaaS at Scale: Kong Cloud Case Study

Spoiler: lots of Logstash, Elasticsearch, and Prometheus. Secret ingredient? OpenResty.

What’s up with GitLab.com? Check out the latest data on its stability

I love it when a company is forthcoming about its challenges and efforts with reliability. I also really love it when they make their production dashboards public.

Instrumenting Read Only Transactions in InnoDB

First off, I didn’t even realize there was a READONLY flag, so the “I have no idea what’s going on” feeling didn’t really get any better after the first paragraph. But if you stick with it, it shows how to use the performance_schema table to grab some metric data that isn’t typically reporting: read-only transactions. The end result can be displayed as a graph, of course.

5 alerting and visualization tools for sysadmins

I like this article, and not only because I’m quoted in it. Though, one particular point the article brings up: I really don’t like the term “informational alert” that I used in the Practical Monitoring book, but I can’t think of a term that better describes a message sent automatically for “review at your convenience” but doesn’t need to wake someone up. For example, high child process churn rate in supervisord or an instance in an ASG that keeps getting killed and respawned: good to know it’s happening so someone can look into it, but it’s not impacting customers, so it’s unnecessary to wake someone up. Anyone got a better term?

Monitoring Pivotal Cloud Foundry – Part 1, Part 2, Part 3 Pivotal Cloud Foundry architecture

Do you run Cloud Foundry? Do you have trouble monitoring it? This article series should help. Note that it doesn’t apply to Pivotal Web Services (the host Cloud Foundry).

Events

NoVA MAMAL Meetup: MySQL Monitoring – November 13, 2018 – Washington, DC USA

The local monitoring meetup group in Northern Virginia is meeting to talk about MySQL monitoring with the imitable Baron Schwartz speaking.

Open Source Monitoring Conference – November 5-8, 2018 – Nuremberg, Germany

I’m speaking at OSMC in Nuremberg this November, which is gonna be super fun and super cold. Come on out if you’re in the area.

Jobs

I’ve opened a job board for monitoring and observability jobs. If you’ve got some monitoring/observability roles you’re trying to fill, how about heading on over there? I’ll be including them here in the newsletter as well.

See you next week!

— Mike (@mike_julian) Monitoring Weekly Editor