Issue 169

A little bit of everything this week, from experimentation platforms to thoughtful risk planning to quick tips for Datadog and InfluxDB users. Enjoy! 🌞🐶💖

This issue is sponsored by:

Lumigo logo

Monitoring and Debugging for Modern Cloud Applications

With automated distributed tracing, Lumigo’s serverless monitoring platform visualizes every transaction, allowing you to understand the flow and correlate issues across services. With one click and no manual code changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service. Try Lumigo today.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Refactoring and Optimizing a High Traffic API at PayPal

It’s easy to take it for granted when large e-commerce sites perform with speed and reliability. This is not an accident; it takes a tremendous amount of intentional effort. Engineers at PayPal have developed their own platform for product experimentation, evaluation, and optimization. Fascinating stuff.

Monitoring of Spark Applications

An exceptionally thorough write-up of one company’s approach to metrics collection and monitoring of their Spark application. Even if you’re not using Spark, I’m pretty sure you’ll take away something useful from this article.

Handling Incidents Mindfully 🧘🏽 — Part 1: Acceptance

I genuinely feel more calm and serene after reading this story. A level-headed approach to how we think about the risk associated with incidents.

Configuring Grafana and Prometheus to Monitor my Docker Development Environment

A quick example for setting up Prometheus with Grafana to monitor a Dockerized Jenkins deployment pipeline.

Introducing Husky, Datadog’s Third-Generation Event Store

I love when observability companies talk about their systems designs. If I’m being honest, you’d think we’d have figured this all out by now (but it’s still a good read). 😜

Leveraging Loki to Proactively Monitor Microservices

Installing and configuring Grafana Loki can be a bit of a hassle. This author is hoping to make it a bit simpler, having condensed their experience into a short how-to for monitoring microservices with Loki.

TL;DR InfluxDB Tech Tips: Optimizations to aggregateWindow()

InfluxDB users who query with Flux should experience some performance benefits from this new pushdown pattern for aggregateWindow().

incident.io logo

incident.io has joined your #general Slack channel.

👋 I'm here to sponsor this issue and automate your entire incident management process in Slack. You just focus on fixing the issue, I'll keep your team and status page updated, nudge you to take the important actions, escalate to the right person when needed, auto-generate your post-mortem and make sure follow-up actions are taken care of.

Install incident.io to your Slack, type /incident and I'll take care of the rest.

incident.io has left the chat. (SPONSORED)

Incidents and Postmortems on the Management level

An interesting proposal for applying incident management and postmortems to non-systems activities. I like where the author’s going with this, I just wish they’d have pushed a little further with the concept.

What makes VictoriaMetrics the next leading choice for open-source monitoring

A highly subjective take on the “best” open source metrics system. Although I would read this one with a grain of salt, the author does present some useful context comparing Prometheus and VictoriaMetrics (though for my money, I would personally favor ClickHouse).

For those who have trouble setting up Datadog RUM

A couple quick fixes for anyone struggling to set up their Datadog RUM.

Grafana Enterprise 8.5.3 and 7.5.16 released with moderate severity security fix

Some moderate security fixes for Grafana Enterprise.

Tools

o11ydev/oy-toolkit

“The o11y toolkit is a collection of tools to work around open source observability products, such as Prometheus, Cortex, Loki, Jaeger and OpenTelemetry.”

Events

Monitorama PDX 2022 - June 27-29 (Portland, OR)

Looks like the list of Monitorama speakers has been published for this year’s summer event. Can’t wait to see you all there!

Job Opportunities

Senior Infrastructure Engineer at Eleanor Health (Remote)

Senior Cloud Native Engineer at Kubeshop (Remote)

Senior Site Reliability Engineer at Barracuda Networks (NA Remote)

Senior Software Engineer - SRE (EGD) at Barracuda Networks (NA Remote)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor