Issue 012

Thanks for joining us for another issue of Monitoring Weekly!

Monitoring News, Articles, and Blog posts
Elasticsearch 5.4.1 and 5.3.3 released & Kibana 5.4.1 and 5.3.3 released

Primarily bugfix releases, but there are some XSS security fixes in Kibana you should probably patch.

Metrics are dead? Thoughts after Monitorama

I love the different take on this. It’s true that Monitorama felt very much “metrics are the past” this year, but the author is spot on in that I think they’re here to stay, and for good reasons.

**Kibana baby kick counter - Part 1

Part 2**

I’m a sucker for non-traditional uses of high-end tools. To quote the author: ““Counting baby kicks is important because changes in your baby’s movement pattern may indicate potential problems with your pregnancy”. Counting and patterns sounds like a technological problem for Elasticsearch and Kibana!” I’ll go ahead and spoil the end for you: no anomalies in the baby’s kick data. Woo!

The Art of Data Visualization

Being in ops, we all love a good line chart. Histograms are starting to become a thing finally, but there’s more options for visualization out there. The author goes over seven graph types and their typical use cases.

Using AWS Lambda to Escape the Monitor of Monitors Infinite Loop

This is kinda neat: using Lambda to solve the who-watches-the-watchers problem. This Lambda function reads from CloudWatch and sends events to PagerDuty when an issue is detected. Then again, who’s watching Lambda and CloudWatch? #yodawgproblems

Visualising logs matters more than searching them

A great point: when was the last time you went digging through all the logs in your log storage? Roughly never? This article is the beginning of a new tool with an interesting solution: group events by context instead of log level.

Non-intuitive downtime and possibly not lost sales

Every outage results in a corresponding loss in revenue, right? Of course. Everyone knows that. Except maybe we’re wrong. The author pulls data from before and after a series of outages that by all accounts should have resulted in big monetary losses…but the data shows otherwise. Maybe it’s more nuanced than we like to admit.

Ops Tools for Real World

Finally happy to see someone else joining the argument I’ve been harping on for a while too: monitoring in most companies is nowhere near easy or straightforward. While we love hearing about the latest bleeding-edge approaches at $unicorn-startup, the realities of monitoring in the majority of companies are very different.

dotScale 2017 - Aish Raj Dahal - Chaos management during a major incident (video)

We’ve all been there: incidents with everyone and their dog on the call, some team member calling in from god-knows-where, a well-meaning manager asking for impact scope, a fruitless morning-after RCA (Root Cause Analysis) exercise… This talk has all of it, but more importantly, it also tells you how to fix it.

Public cloud cost control with Prometheus

Showcasing the flexibility and power of Prometheus exporters once again, the author pulls billing data from Azure to Prometheus, where it can easily be graphed, analyzed, and alerted upon, right alongside the rest of the environment metrics. I’m a big fan of storing billing data alongside operational data–it’s a great way to understand higher-level items such as cost-per-customer from an infrastructure perspective.

About your friendly editor

I’m Mike Julian, a monitoring/observability consultant and trainer. I help companies stop bleeding money due to outages and performance woes by improving application and infrastructure observability. You can find me at AsterLabs.io.

Do you enjoy Monitoring Weekly?

If you like what you’ve seen, here’s the link to invite your friends and colleagues! As always, if you have interesting articles, news, events, or tools to share, send them my way by emailing me (just reply to this email).

See you next week!

– Mike (@mike_julian) Monitoring Weekly editor