Wow, this was a wild week. Drama in the open source community, Kubernetes monitoring, anomaly detection and… math! Buckle up and enjoy this week’s newsletter. 😇🐶😈

This issue is sponsored by:

Chronosphere logo

Observability is a hot topic. So much so that traditional monitoring software vendors are scrambling to offer their takes on observability. Don’t fall for the confusion. Read the 4 key reasons why observability is better than APM tools for cloud native environments.

Articles & News on

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Claims Datadog asked developer to kill open source data tool

We don’t see a lot of drama when searching for monitoring content, so you’ll excuse me if I was taken aback by this report of Datadog “killing” a contribution to the OpenTelemetry project; a data export tool that would make it possible to export data from their APM service. Woof.

Percentiles don’t work: Analyzing the distribution of response times for web services

I was pleasantly surprised to discover this new post from Adrian Cockcroft with research into response time distributions. A fair bit of the math is above my head, but the implications for analyzing logarithmic time series data is exciting.

20 tips for Prometheus Monitoring

Each of these tips are succinct yet they leave you wanting a bit more. This post is the tapas of monitoring articles, and I’m here for it.

etcd: getting 30% more write/s

I love a good performance debugging story, and this one from the Zendesk engineering team delivers. Almost makes me miss fixing slow Graphite clusters… almost.

Spotlight on SIG Instrumentation

If you ever wondered how to get more involved in Observability-related projects or working groups for Kubernetes, check out this interview with a couple of the chairs from SIG Instrumentation.

Pushgateway: Getting Prometheus metrics from Ephemeral sources

Although it feels like the “push versus pull” model was decided years ago, there always seem to be some use cases where push is the inevitable solution. This post goes further than the reference docs, with helpful examples and caveats.

MetricFire logo

Use MQTT with MetricFire to monitor your IoT devices!

Is your team struggling with manual checks, missed alerts and outdated data for your IoT devices? Get real-time insights and complete visibility into your IoT devices with MQTT and MetricFire. Learn how MetricFire can help you monitor your IoT instances and take your IoT monitoring to the next level. (SPONSORED)

CloudWatch anomaly detection

A quick example for using CloudWatch’s built-in anomaly detection feature. Pretty handy for metrics that you might not be familiar with enough to define your own thresholds.

Prediction Performance Drift: The Other Side of the Coin

Speaking of anomaly detection, I enjoyed this analysis of prediction performance drift. This topic is a little outside our usual fare, but it has a lot of applications within our space.

How to Monitor CoreDNS

A thorough look at the metrics that matter for Kubernetes CoreDNS and how to monitor it effectively.

Managing Grafana Dashboards With Terraform

Looking to automate your Grafana dashboards but Grafonnet (Jsonnet) isn’t your cup of tea?



Jsonnet libraries for writing Grafana dashboards as code.


Monitorama PDX 2023 - June 26-28 (Portland, OR)

Thanks to everyone who submitted proposals for the recent CFP. The response was amazing and I can’t wait to see the final speaker lineup and agenda. This is the last week to grab tickets with our Early Bird tickets. Grab yours while you still can!

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor