Understanding Observability (and Monitoring) with Christine Yen

Monitoring and observability is something near-and-dear to my own heart, so I was excited to talk to Christine Yen, Cofounder & CEO of Honeycomb, about observability, why dashboards aren’t as helpful as you think, and the value of being able to ask questions of your own application and infrastructure when you’re troubleshooting.

From The Community

KubeCon observability talk recordings

With KubeCon only a couple of weeks ago, this is a nice turnaround time. Here’s three videos+summaries from the folks at Grafana about their talks at KubeCon.

Real-time data processing for monitoring and reporting — A practical use case of spark structured streaming

Primarily about Walmart Labs’ A/B testing/experimentation platform, this article has a lot of applicability to real-time streaming in observability contexts. In fact, they also wrote an article about that too.

Tracking activity in cloud applications

The announcement of a new, free app called Logbird, built by the folks at Dashbird. This looks really cool. My best summation is that Logbird attempts to make querying streaming logs significantly easier.

DevOps Meets Observability

The quip, “All models are wrong, but some are useful” comes to mind here. I like this model.

Inside Gremlin: Staging Monitoring and Alerting GameDay

The folks at Gremlin are trying their hardest to make chaos engineering more applicable and accessible to everyone, and it’s great to know they’re practicing similar things internally. They recently had a gameday around monitoring & alerting, and this is the resulting writeup for it.

Layer 7 Observability with Consul Service Mesh

I still feel like “Let’s add a service mesh!” might be a question no one is asking, but hey, who am I to judge? Consul now has a service mesh feature, so the folks at Hashicorp wrote up a great piece about observability with it.

kiali/k-charted: Dashboards and Charts library for Kubernetes & Prometheus

Again with service meshes: If you’re using Kiali, I bet this will be super handy.

Jaeger and OpenTelemetry

With OpenTracing+OpenCensus merger, what does that mean for the Jaeger project?

MetricsDB: TimeSeries Database for storing metrics at Twitter

Following up on a pair of posts from 2016(here and here), Twitter is back with another article about their observability stack. This time they’re talking about MetricsDB, their in-house metrics storage solution.

Andrew Certain on Twitter: “If you’re wondering what “P-four-nines” …

“If you’re wondering what “P-four-nines” means, it’s the latency at the 99.99th percentile, meaning only one in 10,000 requests has a worse latency. Why do we measure latency in percentiles? A thread about how how it came to be at Amazon…“

