Issue #089

Latest Articles on

How To Improve On-Call

Sourced from a fantastic list of resources on Twitter, I wanted to put all of this advice in one convenient location. Enjoy. (got something you think needs to be in this list? send it over!)

From The Community

Monitoring Tomcat: Part 1 & Part 2

Got Tomcat? Here’s some great stuff about what you need to know when it comes to monitoring it.

Network Service Level Agreement (SLA) management within a Telco

I feel like people tend to forget that there are real networks and real datacenters underneath our Kubernetes clusters and AWS/GCP/Azure/Heroku/etc. This article is interesting because it explores how SLAs work within a telco environment, touching on Carrier Ethernet technologies and, everyone’s favorite-to-mock protocol, SNMP.

How we built ‘BARITO’ to enhance logging

What do you do when you’re beyond what ELK can reasonably handle? Well, you either fork over a mind-bogglingly large sum of money to Splunk, or you build your own solution. You can guess which option the folks at GO-JEK opted for.

TorfluxDB: Anonymous metrics from Go

I really like the depth and breadth of this article on instrumenting Golang and pushing it to InfluxDB. What caught me off-guard is the last section, though: TorfluxDB is a project that pipes metrics through Tor before going to the InfluxDB server, in an effort to anonymize the metrics. A typical use case might be a service you run where the maintainers also want usage data to improve the software.

Three Pillars with Zero Answers – Towards a New Scorecard for Observability

Because the best teaser for this article is really the closing line, I’ll just leave this here: We need to put “metrics, logs, and tracing” back in their place: as implementation details of a larger strategy – they are the fuel, not the car. We need a new scorecard …

Request Id Tracing in Node.js Applications

Sort of an MVP of tracing in Node.js.

Expect the Unexpected: How to Handle Errors Gracefully

Answer to the Ultimate Question of (On-Call) Life, the Universe, and Everything: 71

As I’ve said before, everyone loves a good index. The folks at PagerDuty, armed with a data scientist and a mountain of raw data, have come up with an index to express on-call health of a given on-call responder. I really love this idea. Sadly, I can’t seem to find any data on what they identified the 16 factors to be.

Developer On Call

I’m with the author: on-call duties should be paid for, separately from and in addition to, base pay. There’s also some other good points in the article too, of course, but I like poking bears with the “f*ck you, pay me” stick.

Sensu Go is here!

A huge congratulations to the team at Sensu for the incredible work in getting this shipped. Sensu Go is a massive improvement over the one we’ve all come to know and love, featuring some really awesome (and long-awaited) stuff: no more Redis or RabbitMQ (thank the gods), no hard requirement for config management, proper multi-tenancy, a versioned API, and much more. If you’re a Sensu user, you should check the release out and be sure to look at the upgrade notes.


jl — JSON Logs, a development tool for working with structured JSON logging.

It’s like if you had jq but without the headache of trying to remember the complex query format. Very neat.

loguru – Python logging made (stupidly) simple

As a Pythonista myself, this is awesome and very welcome. The built-in Python library is…meh. This one is decidely not.


Want your job listed here? Why not submit a post to the job board? It’s only $99/ad for 30 days.

See you folks in January, 2019!

— Mike (@mike_julian) Monitoring Weekly Editor