A little bit of everything this week, from experimentation platforms to thoughtful risk planning to quick tips for Datadog and InfluxDB users. Enjoy! 🌞🐶💖
This issue is sponsored by:
Monitoring and Debugging for Modern Cloud Applications
With automated distributed tracing, Lumigo’s serverless monitoring platform visualizes every transaction, allowing you to understand the flow and correlate issues across services. With one click and no manual code changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service. Try Lumigo today.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
It’s easy to take it for granted when large e-commerce sites perform with speed and reliability. This is not an accident; it takes a tremendous amount of intentional effort. Engineers at PayPal have developed their own platform for product experimentation, evaluation, and optimization. Fascinating stuff.
An exceptionally thorough write-up of one company’s approach to metrics collection and monitoring of their Spark application. Even if you’re not using Spark, I’m pretty sure you’ll take away something useful from this article.
I genuinely feel more calm and serene after reading this story. A level-headed approach to how we think about the risk associated with incidents.
A quick example for setting up Prometheus with Grafana to monitor a Dockerized Jenkins deployment pipeline.
I love when observability companies talk about their systems designs. If I’m being honest, you’d think we’d have figured this all out by now (but it’s still a good read). 😜
Installing and configuring Grafana Loki can be a bit of a hassle. This author is hoping to make it a bit simpler, having condensed their experience into a short how-to for monitoring microservices with Loki.
InfluxDB users who query with Flux should experience some performance benefits from this new pushdown pattern for
incident.io has joined your #general Slack channel.
👋 I'm here to sponsor this issue and automate your entire incident management process in Slack. You just focus on fixing the issue, I'll keep your team and status page updated, nudge you to take the important actions, escalate to the right person when needed, auto-generate your post-mortem and make sure follow-up actions are taken care of.
Install incident.io to your Slack, type /incident and I'll take care of the rest.
incident.io has left the chat. (SPONSORED)
An interesting proposal for applying incident management and postmortems to non-systems activities. I like where the author’s going with this, I just wish they’d have pushed a little further with the concept.
A highly subjective take on the “best” open source metrics system. Although I would read this one with a grain of salt, the author does present some useful context comparing Prometheus and VictoriaMetrics (though for my money, I would personally favor ClickHouse).
A couple quick fixes for anyone struggling to set up their Datadog RUM.
Some moderate security fixes for Grafana Enterprise.
“The o11y toolkit is a collection of tools to work around open source observability products, such as Prometheus, Cortex, Loki, Jaeger and OpenTelemetry.”
Looks like the list of Monitorama speakers has been published for this year’s summer event. Can’t wait to see you all there!
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor