There’s been another outage…and the monitoring system didn’t catch it.
Why didn’t the monitoring system catch it? Because you didn’t have the right data.
Why didn’t you have the right data? Because you didn’t know you needed it.
That is, until the outage happened. Now you know you need it.
If you’re responsible for ensuring the servers and applications are available and running smoothly, I’m sure you’re no stranger to this scenario. It’s certainly happened to me more than once.
The most frustrating part about it is having to answer to other teams and management about why you didn’t have the data to begin with. Isn’t that the job of the monitoring system, to catch these sorts of things? What good is it, if it isn’t doing that?
It’s not always easy to know what metrics and logs you need ahead of time. More often than that, instrumentation is added haphazardly after some incident.
What if you were able to look at an existing or new service/app and know within minutes what metrics and logs you needed to get from it?
What if you could approach all of this systematically and know you aren’t missing something critical? What if you really had confidence that your monitoring system had all the data it needed, and it was the right data?
Sound crazy? I swear it’s not.
I’ve been teaching my consulting clients how to systematically approach monitoring and determining all the instrumentation they need.
A YCombinator FinTech company went from basically no monitoring at all to a clear, easy-to-understand strategy for monitoring (and a solid implementation too, of course).
A well-known enterprise company transformed their understanding and approach to instrumentation in just days, allowing them to finally spot a tricky performance problem they had spent months looking for.
These companies no longer guess about what metrics or logs they need. They no longer have to answer uncomfortable questions about why the monitoring system didn’t have a piece of crucial data.
They have confidence in their monitoring–something they didn’t have before.
Companies typically pay $30,000+ for personalized help with this, so I’ve turned the consulting help into a self-paced video course. You’ll learn the same concepts I’ve taught to my clients.
This course is eight videos of instruction covering several important topics, including a quick overview of monitoring and observability, how to pick monitoring tools, and much more.
Monitoring is an overloaded term these days, so let's get on the same page and make sure we’re talking about the same thing.Length: 00:07:18
What is "observability"? How does it inform your monitoring efforts? Why does it even matter?Length: 00:06:03
Don’t pick your tools based on what you’re familiar with, or what the last job used. Let me show you how to determine if your monitoring tools have all the capabilities you need, and if not, what you should look for in a monitoring tool.Length: 00:14:51
Don't know what tools to use, but have an affinity for SaaS options? These are my favorites.Length: 00:05:17
Don't know what tools to use, but just love running your own monitoring platform? You can’t go wrong with these.Length: 00:07:43
There are three models that form the base of the framework, hitting business KPIs, service-level metrics, and host-level metrics. When combined, you’ll never worry if you’ve missed something important again.Length: 00:13:27
A walkthrough of applying the instrumentation framework to my favorite app, tater.ly, the world’s premier French fry review website ;)Length: 00:16:03
Three strategies for how to monitor something you don't control or have access toLength: 00:06:03
"This course was great! I definitely recommend it for anyone who wants to understand how to get started with improving their monitoring."
"Mike's not only the foremost expert in this space, he's arguably the only one who's not trying to sell me a monitoring product."
"This course was very useful for helping me understand modern monitoring, choosing the right tools, and the right metrics. The UCA methodology was especially helpful for understanding business metrics."
Here's my promise to you: if the course doesn't help you sort out of your monitoring challenges just let me know within 60 days of purchase and I'll give you a full refund. Seriously, I'm not kidding: if you don't feel this course helped you out, I'll refund your purchase--and you've got an entire 60 days to try it out.
I'm Mike Julian, the author of the popular O'Reilly book, Practical Monitoring, the Editor of the Monitoring Weekly newsletter, and a consultant on all-things-monitoring. Unlike almost every other monitoring expert, I don't work for a vendor, which allows me a certain flexibility and candidness you won't find elsewhere. I've worked with tons of great companies to help them improve their monitoring, such as Docusign, Airbnb, Hornblower Cruises & Events, and many others (ask me about monitoring supercomputers at Oak Ridge National Laboratory sometime--it's a fascinating problem). It's my mission to share what I've learned and help everyone improve their monitoring.
This course is for anyone who is responsible for building or improving application/infrastructure monitoring. That means you, SREs, DevOps Engineers, sysadmins. Yes, software engineers will also find this useful.
If you don’t have the interest, time, or authority to make meaningful changes to your monitoring, this course isn’t a good fit for you right now.
Let me assure you: I’ll be the first to say that doing anything worthwhile means hard work, and this course is no exception. It won’t fix all of your problems, but it will most certainly put you on the right path--as long as you put in the work to implement what I’m teaching you.
This course will teach you how to identify instrumentation points, and the metrics you need from your applications and infrastructure. By the end of it, you can expect to have a solid strategy for instrumenting your systems.
While the course is only 75 minutes long and you’ll no doubt begin seeing areas of improvement for your own environment, it may take you a few days to a week to implement everything in the course. If you follow the lessons, you can expect to see tangible results within a week.
The course is aimed at all levels, but you do need some baseline experience with systems administration and monitoring systems. For example, if you're a student still in school, you probably won't find this course helpful. If you're an experienced engineer, you'll find the frameworks in Module 3 useful in guiding how you talk about, think about, and teach monitoring to other people and teams in your company.
As a senior engineer, much of your responsibility is in teaching other people what you know. While you may not learn anything new about monitoring personally, you will most certainly learn new ways for you to teach it to others in your company. Module 3 will be of particular interest to you.
You'll need a baseline level of experience with systems administration, but really no more than a year or two of experience.
Very much so! As someone experienced in tech already, you're well-prepared to immediately understand and start implementing the lessons here.
Absolutely! In fact, you're going to have an even easier time of it because you don't need to worry as much about the infrastructure. Everything you learn here will be just as applicable as if you were running in AWS, GCP, Azure, OpenStack, Kubernetes, or your own datacenter. The only thing that changes is how you implement what you learn.
Send me a message and let's chat about whether this course is right for you, and if not, how I can help otherwise.
p.s. Life’s too short to deal with shitty monitoring, and the worst part is, it doesn’t even need to be that way. You really can dramatically improve your monitoring in just days.