Issue 011

Thanks for joining us for another issue of Monitoring Weekly!

Monitorama
Monitorama is one of the few conferences that livestreams the entirety of the conference for free, writing it directly to YouTube. While the video editors are doing their magic and breaking them into individual videos, the raw, unedited streams are available here. Fair warning: each one of these is 8+ hours long.

The Outage

A major power outage in Portland’s downtown area took down the Monitorama venue, prompting incident response from the Monitorama organizers, complete with a status page and regular updates. Neat. (the incident response, not the outage–that part stunk)

Of course, no conference is complete without attendee recaps. I love the different perspectives and takeaways found in these recaps, so I’m linking all that I found–my apologies if I missed yours. These are submitted without further comment, as I think you’ll enjoy reading all of them.

Monitorama PDX 2017 – Hacker Noon

Monitorama 2017 : My Impressions - Manas Gupta

Monitorama 2017 Summary · Michael Kehoe

Monitorama! | Metal Toad

Monitoring News, Articles, and Blog posts
Going open-source in monitoring, part I: Deploying Prometheus and Grafana to Kubernetes

The first in what’s looking like will be a pretty awesome series on implementing open-source monitoring. This article is exactly as the title suggests: Prometheus and Grafana, running on Kubernetes. You won’t find a super deep-dive here, but you will find a configs-included starter approach.

A Million Metrics per Second

I love stories about the monitoring journey teams go through and the lessons they learn about their apps, infrastructure, and themselves along the way. This one is from the folks at Swissquote and is largely Graphite-focused. Also, 1.1 million metrics per second is nothing to sneeze at (everyone thinking “Graphite doesn’t scale” should probably settle down now…)

How we do HumanOps at Server Density

It’s a little hard for me to summarize this one because there’s just so many great points, but I’ll try: at the center of your infrastructure are humans, not servers. Build your infrastructure and apps with humans in mind or you’re gonna have a bad time (especially with on-call).

Using Elasticsearch to Detect Signs of Ransomware like WannaCry

Detecting security threats in your infrastructure often comes down to knowing what to look for–signatures, as they’re called. The folks at Elastic walk us through setting up the WannaCry signature detection using the Elastic Stack.

Modifications to the current on-call system?

I love finding parallels and inspiration in other fields. When it comes to improving on-call, there is perhaps no better industry to learn from than the medical field. The author, a Malaysian Medical Officer, makes a case for the various ways to improve the on-call experience in her industry. My favorite recommendation is the mandatory time off following an on-call shift.

How Basic Performance Analysis Saved Us Millions

Half the reason (maybe more?) any of us really care about monitoring is because it allows us to not only spot performance problems but also fix them and generally improve upon our situation. The folks at Heap Analytics ran into such a scenario and walked us through it all. Bonus points for real-world uses of flame graphs.

Tools
LambStatus: Serverless Status Page System

This looks like a slick tool. It creates a StatusPage.io-like page on your own infrastructure using AWS Lambda.

Thanks for subscribing to Monitoring Weekly, folks! If you like what you’ve seen, invite your friends and colleagues! As always, if you have interesting articles, news, events, or tools to share, send them our way by replying to this email.

See you next week!

– Mike (@mike_julian) Monitoring Weekly editor