Monitoring & Observability 2019 Predictions

Wherein I make my own predictions about what we might see in 2019 in the world of monitoring and observability.

Sarah Mei on Twitter: “My fundamental issue with being on call is that I care more about my personal life & health than I do about whether my employer’s website is operational.”

I freaking love this thread from Sarah Mei on on-call. One of the takeaways is something we’re finally starting to see a little bit of movement on: on-call should be paid, above-and-beyond your standard pay, and it should be done so whether you’re paged or not. There are a whole lot of other great points in the thread, so I recommend clicking through and reading the whole thing.

Measuring Wikipedia page load times

Frontend monitoring doesn’t get enough love, in my opinion, so be sure to read this article and enjoy it–it’s quite useful.

Loki: Prometheus-inspired, open source logging for cloud natives

Like a distributed, fast grep "error" *.log built for cloud-native infrastructure. Kubernetes is the primary use case right now, but I see a lot of potential here.

Time Series at ShiftLeft

The folks at ShiftLeft go into some detail on their own metrics architecture. Spoiler: TimescaleDB+Kafka+Prometheus+Grafana

Open Sourcing Bro-Sysmon

What do you get when you hook Bro and Windows’ Sysmon together? Well, here’s your answer. The folks at Salesforce have open-sourced their tool here, and it will help you improve your Windows security monitoring.

How Dashboards are Changing Human Behavior in DevOps

Dashboards get a lot of flak these days, but I think it’s telling that the people throwing the shade at the concept of “I have dashboards to tell me things” are also those working in very advanced, technically-mature, small environments. The truth is that dashboards are an incredibly valuable asset, and as this article points out, helped IBM to start tearing down silos. Dashboards are great, y’all.

Three Pillars, Zero Answers: We Need to Rethink Observability

Could it be that the industry’s fascination with the “three pillars of observability” is incorrect, misguided, or at least incomplete? Ben Sigelman, co-founder of LightStep, makes a damn good article that we’re focused on the wrong thing.

Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service

Need multi-tenancy in your Prometheus setup? Built-in HA? Long-term data storage with Cassandra, DynamoDB, and more? Cortex is your thing. It’s used by several companies already, including Grafana Labs, Weaveworks, and even Electronic Arts.

Server Timing

Have I mentioned how much I love the idea of the Server-Timing HTTP header? Well, here’s another fantastic article about it written by an editor of the spec and engineer at Akamai.

bloomberg/goldpinger: Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster

Not much more I can say beyond the title, really. It looks pretty handy.


