This issue is sponsored by:
81% of the world’s busiest domains are open to outages because of poor DNS setup. See how yours compares across several DNS security measures - Know how you might be in jeopardy - And the three easy fixes to make. Get Panopta’s research report The Perilous State of Global Web Domains in 2019 today.
Latest Articles on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
Probably the most common question I received when I told people I was writing a book about monitoring was, “Have you read James Turnbull’s book?” I’m putting that to rest with a delightful conversation with James Turnbull on a variety of topics, including which of his own books is his favorite, some not-so-subtle digs at Kubernetes, and why James thinks DevOps is dead.
From The Community
Almost certain to start debates in private channels, let’s talk about sampling.
For the Node folks among you, this should be handy.
“… how do we add observability to what we care about the most, our business logic, without clogging up our codebase with instrumentation details? And, if this instrumentation is important, how do we test that we’ve implemented it correctly?”
It’s interesting to me that CloudWatch Logs have a default expiry of “never,” but as someone who looks at a lot of AWS bills, it’s thankfully never at a “oh shit” level. Still though, maybe a good idea to set expiration on them to keep things clean.
I’m always a big fan of taking lessons from outside of tech, so here’s one from the wider world of business strategy.
In case you couldn’t make it to the Seattle Datadog Summit, Datadog has published the talks. There’s some great stuff in here; I really liked the one at the bottom about SLIs and SLOs.
I’m having trouble reconciling how the #15 most hated company in America has such great technical staff, but here we are.
This is a super handy tool to making incident management in Slack less painful. Be sure to watch the video too, which goes into some detail about the thoughts behind it and how Monzo uses it.
After my mention of the Sidekiq monitoring article last week, a reader was kind enough to send me a tool they wrote that makes it even easier. Enjoy!
Oh man, so good: “Success is invisible. That is, the work that goes into creating the conditions for success can be difficult to describe or see. It is driven by our expertise and collective tacit knowledge. This seems a paradox, that we could be successful yet not fully understand the factors that contribute to things going “right”.”
Based on years of consulting on this exact topic, I’ve found that better observability is rarely a technical problem. This video is great and should help you get more people on board with the idea.
The folks at Google are presenting this paper later this month and it’s a fascinating read. From the paper: “We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We argue that a mutually beneﬁcial set of Service Level Expectations (SLEs) and Customer Behavior Expectations (CBEs) ameliorates many of the problems of today’s SLOs by explicitly sharing risk between customer and service provider.””
This issue is sponsored by:
Yes, it’s a thing! Blue Medora helps you integrate your on-prem infrastructure and your cloud infrastructure into one place. Rather than making your users learn yet another monitoring tool, Blue Medora acts as a bridge, transparently shipping metrics from your datacenter hardware to monitoring tools of your choice.
Yes, you read that right: Monitorama is doing a new event on the American east coast! I’m super excited.
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor