I’m, like, 90% sure Monitoring Weekly isn’t a tire fire, but then again… What do you think?
This issue is sponsored by:
Getting Started with Telegraf
Telegraf is an open source plugin-driven server agent for collecting and reporting metrics. It has plugins or integrations to source metrics directly from the system it’s running on, pull metrics from third-party APIs, and listen for metrics via a StatsD and Kafka consumer services.
Monitoring News, Articles, and Blog posts
Overview of Monitoring in Azure
I knew the folks at Azure were doing some great stuff, but I hadn’t realized how well they’ve covered the map when it comes to monitoring tooling. Pretty awesome, if you ask me.
Logging is awesome, yo. …except when it’s not (but we don’t talk about those times.) Instead, let’s talk about doing it well, which this article makes a great read for. That said, I’ll make a minor disagreement with the author: you probably don’t want secrets/credentials in your logs, at all, ever.
I’ve been half-joking lately that distributed tracing is a bunch of bullshit: it makes for an amazing demo but it’s a total PITA to actually implement and use. That said, the diagrams and use-cases in this article are pretty good at showing the value of tracing in Go. Now only if it didn’t suck so much to do at-scale…
Do you really know your response time? (video)
This is such a fantastic talk, given by an engineer from SkyTV, about how they measure performance of delivering video content to their customers. My favorite bits start about five minutes in when he gets into percentiles and how they answer the question, “What’s our 99-percentile performance right now?”
Key metrics for SQL Server monitoring
One of the things I’m often asked to help with as a consultant is a seemingly-simple problem: “what the hell do I monitor?” Datadog, who seems to be talking to the same people I do, dropped this series on MS SQL Server this past week, covering everything you should care about when it comes to MS SQL Server.
I am super excited to see Sensu 2.0 moving along, and now the beta is open to the public. Have it, folks–it’s pretty neat.
Grafana Module :: Icinga Exchange
I know there’s a whole a bunch of you that are big Icinga fans and have been running Grafana alongside it. Now your day just got a bit better: direct integration of Grafana (with InfluxDB and Graphite) into Icinga2.
Brad Lhotsky on Twitter: “Personal rage thread on #Monitoring vs. #Observability. Courtesy of this delineastion: “Monitoring is telling you your systems are broken, Observability is being able to ask why?” 1/”
This is a great rant on he recent monitoring vs observability debate. tldr we probably ought to get back to doing work instead of debating terminology/splitting hairs
Google Cloud Platform Blog: Defining SLOs for services with dependencies
Foundational principle of setting SLAs/SLOs: you can only hit an availability as high as your worst-performing dependency (all of you in us-east-1? good luck!). How do you determine a suitable SLO when you have dependencies? How can you be confident of hitting your SLO if you don’t even control the dependency (such as a third-party API)? This has some really great ideas.
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor