Issue #080

From The Community

SLO BURN Velocity NYC October 2018 (slides)

Jamie Wilkinson’s slides from his recent Velocity NYC talk on SLOs are available. There’s enough notes in the slidedeck to make this useful without the video. I can’t wait for the video, though–looks like a great talk.

Why Use K-Means for Time Series Data? – Part 1, Part Two

These articles remind me of why I have a long way to go with my grasp of stats.

Secure Application Metrics & Distributed Logging with SPIFFE

Want more security around your Prometheus endpoints and fluentd config? This article has you covered.

Four Great SaaS Visualizations

Visualization goes hand-in-hand with great monitoring but I’ve found too few of us really think hard about it. This article isn’t about monitoring at all, but rather talks about the business side of things and visualizing business KPIs. That said, there are great takeaways for those of us building visualizations or just creating charts for a report every now and then.

We can do better than percentile latencies

We’ve known for some time that using the average for things like latency results in missing a ton of data, which is why using 95th or 99th percentile is now common. But the author makes another point: many vendors implement percentiles in a pre-aggregated way, resulting in the same problem.

Thoughts from the Front-line: What You’ve Always Wanted in a Time-Series Analytics Engine for Observability

The cofounder at Wavefront talk a bit about their thoughts on time series architecture, must-have/nice-to-have features, and more.

Chaos Engineering Without Observability … Is Just Chaos (slides)

Slides from a recent talk from Charity Majors, and it’s awesome. The beginning is more about why you should be testing in prod, but keep reading–it gets into some great observability stuff, including an apt comparison of monitoring a monolith vs distributed system.

How to manage Prometheus high-availability with PostgreSQL + TimescaleDB

Exactly as the title says: The folks at TimescaleDB suggest an architecture for highly-available Prometheus backed by Postgres and TimescaleDB.

Tools

garie – An open source toolkit to monitor web performance

A neat tool to ship web performance metrics (eg Lighthouse) to InfluxDB + Grafana.

pulumi/kubespy: Tools for observing Kubernetes resources in real time, powered by Pulumi.

Quote the author: What happens when you boot up a Pod? What happens to a Service before it is allocated a public IP address? How often is a Deployment’s status changing? kubespy is a small tool that makes it easy to observe how Kubernetes resources change in real time …

Kubernetes Resource Statistics

A new tool that offers a lightweight alternative to kube-state-metrics for k8s resource metrics.

Grafana v5.3 Released

Chock full of awesome stuff, too, including Stackdriver as a core datasource and a much-improved Postgres query builder.

Open Sourcing Mirus

For those of you with Kafka in your monitoring streaming pipelines, this could be useful: a tool for replicating data between multiple Kafka clusters.

bpftrace (DTrace 2.0) for Linux 2018

For the dtrace fans among you, rejoice in this awesome news.

Events

NewOps Days – October 10th, 2018 – San Jose, CA USA

Splunk is putting together a new event series that looks pretty neat, and it isn’t just a Splunk event in disguise. Best part: it’s free.

Las Vegas Monitoring Meetup – Las Vegas, NV USA

Nothing scheduled yet, but if you’re in the Las Vegas, get on over there and join the group.

Jobs

I’m launching a job board for monitoring and observability jobs. If you’ve got some monitoring/observability roles you’re trying to fill, how about heading on over there? I’ll be including them here in the newsletter as well.

See you next week!

— Mike (@mike_julian) Monitoring Weekly Editor