Issue 144
This week’s issue is heavy on logging stories, software updates, and remote job postings. Oh, and one of the most unusual applications for Prometheus I’ve seen. Mee-oww. 🙀
This issue is sponsored by:
Start incident response with context to all your alerts in one view
Moogsoft speeds up incident response with dynamic anomaly detection, suppressed alert noise, and correlated insights across all your telemetry data. Go from debugging across multiple tools, screens, and dashboards into a single incident view so you and your teams can take a more proactive approach to reduce MTTR. Sign up for the Moogsoft Free community plan today!
Articles & News on monitoring.love
Observability & Monitoring Community Slack
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Centralized Logging with ELK, Kafka and Kubernetes
How VakıfBank designed and manages their centralized logging infrastructure with minimal loss and delays.
“Prevention is Better than Cure” — Measuring the Health of Software Applications
Paypal’s ETL pipeline handles 30-55 billion events each day. With this much streaming data, any downtime has a cascading effect on downstream applications and analytics. Here are some of the critical metrics they monitor to ensure the overall health of the system.
How to Perform Incident Post-Mortems: Identify Root Cause with “Five Whys”
This might be the best article I’ve read on incident response and postmortems in a long while. Read this. Share it.
Do you remember, the twenty fires of September?
An honest and transparent look at all of Honeycomb’s incidents and lessons learned during a very busy September. Fantastic article.
Know your Azure environment better with Grafana
How one company leveled up their observability by chucking the default Azure Cloud dashboards and replacing them with Grafana.
Monitor Your Pet’s Health with Litter Exporter
Next time someone tells you they’ve “seen some 💩”, just send them this story. There truly is a Prometheus exporter for everything.
How to Improve the Observability of Nginx with Apache APISIX
If you’ve run NGINX in production, you’re probably familiar with the scarcity of useful metrics in the open source version. The Apache APISIX project (based on NGINX’s underlying network libraries) would like you to consider swapping out NGINX for their project (with improved observability).
Work. Without the hard work.
LogicMonitor empowers teams to spend less time troubleshooting and more time innovating with fully automated infrastructure monitoring and log analysis. AI-powered intelligence automatically detects monitoring resources, surfaces anomalies, and provides root cause analysis across your entire stack. Leave the manual configuration, expensive hardware, and long hours of troubleshooting behind with a free trial of LogicMonitor. (SPONSORED)
Integrate Kafka and OpenFuction to Realize Elastic Kubernetes Log Alerts
I’m grateful that most logging isn’t this complicated, but this is still an interesting solution for processing serverless logs and alerts using OpenFunction and Kafka.
Grafana Tempo 1.2 released: New features make monitoring traces 2x more efficient
A new version of Tempo is out, with performance improvements, recent traces search, and a new “scalable single binary” operational mode. Oh, and a handful of breaking changes.
Loki 2.4 is easier to run with a new simplified deployment model
Loki is also seeing a new release, with support for out-of-order logs (wait, seriously?) and its own new “scalable simple deployment” mode.
Tools
“KubeScrape: An open-source dev tool that provides an intuitive way to view the health, structure, and live metrics of your Kubernetes cluster.”
“Privateer is a lightweight Kubernetes prototyping and monitoring tool developed in Electron.js.”
“Kr8s is a desktop application made for developers that need to monitor and visualize their Kubernetes clusters in a user friendly GUI.”
Job Opportunities
DevOps Engineer, Data at Co–Star (Remote)
Sr Infrastructure Software Engineer at Stash (US Remote)
Infrastructure Engineer at Hatch (Remote)
Senior Software Engineer, Kubernetes at Form3 (EU Remote)
Senior Software Engineer, Observability at Cash App (Remote)
Site Reliability Engineer at SetSail (Remote)
Cloud Engineer, Foundation Engineering at Redox (Remote)
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor