Issue 046

Hey folks, welcome to another installment of Monitoring Weekly! Did you write something about monitoring recently? Maybe got an idea rolling around in your head? Send it on over and let the community learn from you. :D

Monitoring News, Articles, and Blog posts
Elastic APM GA released

The combining-of-tools continues, with Elastic’s APM product going GA this past week. I’d be curious to hear experiences from anyone using it!

Structured Logging and Your Team

Continuing in their series on structured logging, this post from Snyk (guest-posting for Honeycomb) goes through how they structure their logs. One of the more interesting bits in this is how they effectively record each request end-to-end by logging both the start and end of it at a minimum, allowing them to follow a request and the actions it encounters along the way of being serviced.

How production engineers support global events on Facebook

The author walks us through how Facebook prepares a high-traffic service (Facebook Live) for one of its highest-traffic days of the year: New Years Eve through artificial load generation and lots of testing.

Postgres Log Monitoring 101: Deadlocks, Checkpoint Tuning & Blocked Queries · pganalyze

It really is exactly as the title says: what to monitor for inside the Postgres log. I certainly learned a few things, since I have more of a background in MySQL personally.

8 Things to Monitor During a Software Deployment

Straightforward enough for a checklist, which is pretty neat. I’d caution you to be wary about relying on load average, though.

Google Cloud Platform Blog: Applying the Escalation Policy

A followup to previous installments in the Google escalations series, this one goes through some specific scenarios and how they might play out within Google’s SRE escalation framework. This really isn’t terribly useful for most of you (I’m pretty sure most of us couldn’t put a block on all releases without a very stern talking-to), but it is certainly interesting to read.

threatstack/shush: A management tool for silencing Sensu checks written in Rust

One of the neat things about Sensu is the API and how many useful things you can do with it, such as creating new tools that Sensu itself doesn’t solve. (then again, that might also be one of the sometimes-frustrating things about Sensu) The folks at Threatstack have released their own tool used for silencing checks.

On-call doesn’t have to suck

Another monster knowledge-bomb from Cindy/@copyconstruct. I won’t try to summarize it all here, but it’s definitely worth a read.

Monitoring a critical part of your infrastructure: Amazon Elasticsearch domain

I really like tactical, “Do X” style posts like this. The author goes through exactly what metrics and alerts (suggested starting thresholds too, even!) you should be considering for running AWS Elasticsearch.

Simulating AWS tags in local Prometheus

If you’re running Prometheus, this article could be super handy: how to mimic AWS tags in Vagrant+Prometheus for testing purposes.

See you next week!

– Mike (@mike_julian) Monitoring Weekly Editor