I love reading production stories from companies willing to share their experiences. This week’s newsletter has these, plus plenty of tracing and some creative monitoring and metrics solutions. Enjoy!

This issue is sponsored by:

LogicMonitor logo

Work. Without the hard work.

LogicMonitor empowers teams to spend less time troubleshooting and more time innovating with fully automated infrastructure monitoring and log analysis. AI-powered intelligence automatically detects monitoring resources, surfaces anomalies, and provides root cause analysis across your entire stack. Leave the manual configuration, expensive hardware, and long hours of troubleshooting behind with a free trial of LogicMonitor.

Articles & News on monitoring.love

Observability & Monitoring Community Slack

Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.

From The Community

Observability at scale: How we built a cutting-edge Dream11 monitoring ecosystem

Most of us don’t get the opportunity to manage observability systems for this much traffic. An insightful look at how Dream11 built out a “web scale” (my words, not theirs) monitoring infrastructure in AWS.

Monitoring Azure AKS applications using the Grafana Observability stack

How one company built their entire observability stack with open source components and runs it on Kubernetes in Azure AKS.

Network monitoring – Use AWS Athena to query VPC Flow Logs

My interest in monitoring software started with TCP/IP networking and the NetFlow protocol. This story reminds me a lot of that, but for the modern cloud toolset.

Unpacking Observability: The Path to OpenTelemetry

We’ve seen countless articles explaining what OpenTelemetry is, where and how it can help us, etc. This is one of the few articles I’ve read that actually walks us through the considerations leading up to adoption, which questions to ask yourselves, and how to plan the rollout.

How to Fix Disjointed Traces with Context Propagation

What happens when your OTel-instrumented service talks with a service that uses a different tracing library? Tucows demonstrates how Context Propagation fixes this problem for them.

Grafana 8.3 released

Another minor release for everyone’s favorite dashboarding tool. Some of the more interesting features (to me) include the new Candlestick panel and support for Amazon CloudWatch Metrics Insights.

Monitoring multiple OKE clusters with Prometheus, Thanos and Grafana

A two-part series covering how to set up monitoring for OKE clusters in Oracle Cloud Infrastructure (OCI).

Chronosphere logo

Chronosphere is the only observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. Teams at enterprises, large cloud-native, and mid-market companies around the world trust Chronosphere to help them operate scalable, highly available, and resilient applications. Learn more here. (SPONSORED)

Getting Started with Go and InfluxDB

A good primer for working with the Go client library and InfluxDB.

Cloudwatch Resource Health Monitoring For EC2 Hosts

Nice to see CloudWatch add some native visual elements for monitoring EC2 hosts. I’d say this is long overdue, but better late than never.

Combining Prometheus and Azure Monitor metrics in Grafana

This feels like a throwback to my old days doing weird transformations with Graphite metrics. Not gonna lie, I enjoyed this more than I probably should. 🤣

Observability: Tracing in AWS Lambdas using AWS X-Ray

A quick look at AWS X-Ray and how to start tracing your Lambdas with it.

Job Opportunities

Senior Infrastructure Engineer at Sysdig (US Remote)

Senior Platform Engineer at Replicated (Remote)

Golang Software Engineer at Replicated (Remote)

Site Reliability Engineer at Mr Yum (AU Remote)

Senior Engineer - Database at Bellese (US Remote)

Site Reliability Engineer at MaxMind (Select US, CA Remote)

Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)

See you next week!

– Jason (@obfuscurity) Monitoring Weekly Editor