The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

67. Real World AWS – Using Custom CloudWatch Metrics to Monitor Disk Space

Chris Hickman and Jon Christensen of Kelsus and Rich Staats of Secret Stache discuss using custom CloudWatch metrics to monitor disk space.

 

Some of the highlights of the show include:

  • Trade Secrets: What you can do to monitor, manage, and not run out of disk space
  • Consumption Culprits: Apps take disk space during normal operation for log files, container images, and bugs (i.e.,  CannotPullContainerError: no space left on device)
  • Free standard EC2 metrics included with CloudWatch: CPU utilization, disk and network throughput, and status check
  • Sound the alarm! Develop custom metrics for monitoring disk space and memory by installing Perl environment, dependencies, and script (mon-put-instance-data.pl)
  • Create Custom Metric: Name, dimensions, data, and publish
  • Keep it Simple: df tool + Awk for parsing /bin/df -k -l -P / | tail -n 1 | awk ‘{print $4 ” kb”}’
  • Key Components of Alarms: Name, metric and level of granularity to monitor, dimensions, conditional expression, and alarm action
  • EC2 Host Initialization Process: Schedule cron job to sample disk space, and publish as custom CloudWatch metric
  • CloudWatch alarm for instance’s disk space metric based on minimum threshold:
    • user_data script
    • Install CloudWatch alarm for disk space:
      • aws cloudwatch put-metric-alarm …
    • Create cron job that pushes custom CloudWatch metrics for disk space:
      • echo -e “SHELL=/bin/bash\n*/5 * * * * root bash < /tmp/put-disk-metrics.sh” > /etc/cron.d/ecs-disk-metrics
  • Considerations: EC2s are cattle, not pets; instances end, and new ones spin up; retention period is 15 months; custom metrics can’t be deleted, but alarms can
  • Recommendation: Don’t worry about retention of CloudWatch metrics for terminated instances; focus on periodically reaping alarms for terminated instances

 

Links and Resources

CloudWatch

Amazon Elastic Container Service (ECS)

Amazon Elastic Compute Cloud (EC2)

AWS Lambda

WordPress

WPEngine

Docker

Perl

Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances

Python

Bash

Slack

Kelsus

Secret Stache Media

Rich: In Episode 67 of Mobycast, Jon and Chris discuss using custom CloudWatch metrics to monitor disk space. Welcome to Mobycast, a weekly conversation about cloud-native development, AWS, and building distributed systems. Let’s jump right in.

Jon: Welcome, Chris and Rich. It’s another episode of Mobycast.

Rich: Hey.

Chris: Hey, guys. Good to be back.

Jon: Rich, I saw you yesterday. We saw each other in person at a client meeting. That was a treat.

Rich: It was good. It was a fun conversation. Nice to get out of Denver.

Jon: And I got out of Eagle, too. We met in the middle and got a free lunch out of it.

Rich: Definitely.

Jon: Free lunch via beautiful flowing Colorado creek. It was amazing. Chris, what have you been up to?

Chris: Well now I’m just jealous. Free lunch by the side of the creek in Colorado? #jealous

Jon: Yeah. Anything else other than [00:01:07] going on in Seattle these days?

Chris: School year’s winding down and this is a big one because my son is graduating high school. Tonight is the graduation ceremony.

Jon: That is a big one. Next year you’re getting a boat and going sailing, right?

Chris: No. This is where the tip jar comes out for Mobycast to help pay for college. There’s no business sailboat.

Jon: We’re playing with a new thing. What’s it called?

Chis: Glow.

Jon: Glow, that’s right. It’s named after a Netflix series. It’s just like that thing where people that are into Mobycast can give us a few bucks. I don’t know. We’ll play with it and see if we like it. We’re not doing this for the money, obviously. We’re doing it because we just like to talk so much and we like to record ourselves taking.

Today we’re going to talk about disk space which is a thing that just has a tendency to bite you when you’re running production systems. It’s a thing that’s a finite resource and you can run out of it. When you run out of it, computers don’t like it and they will stop running your code, then you’ll be down. Sometimes it’s really hard to recover from or you don’t even realize that’s why you’re down. Just like anybody that’s ever run production systems for more than 19 days, you’ll probably have a disk outage.

We’re going to talk about that and in regards to cloud-native development, particularly around AWS because that’s what we know, and just talk about what you can do to make sure you don’t run out. I was trying to think of a metaphor, like this is sort of we’re going to do a little bit of black-belt-level AWS conversation here, like this is what you don’t learn in a one-hour how to use elastic container service talk that you go to at the conference. This is like when it gets real. But I don’t like the metaphor black-belt so let’s another one.

Rich: It’s like the trade secrets.

Jon: Yes, there you go.

Rich: It’s like you hire a plumber. What took you four hours, it takes them three minutes because they’ve done it, they’ve done the hard work, they’ve done it before, they know the ins and outs. Hopefully, that’s what we can do here is that we can save some folks a lot of trouble and hair-pulling.

Jon: You’re making me laugh because I didn’t hire a plumber recently and it’s a good thing the snake was 15ft long, I only had 5ft to spare. It took me definitely three hours of just, Rich and I would have called a plumber. Let’s get it started, let’s talk about disk space and just kick this off, Chris.

Chris: In a way, this is really weird to talk about this because here we are 2019, cloud has been around for a long time. We have AI, ML, facial recognition where all this stuff is in the news now. Technology, it just leaps and bounds. It’s just amazing how fast things are advancing and yet here we are, we’re talking about how to monitor disk space.

It’s sad, but it’s a very practical problem that folks have. As you mentioned, the longer a host is up and running, the greater the likelihood that it’s going to have problems with disk space. That’s because the apps that we deploy and are running there, they’re consuming disk space as part of their normal operation, so whether it be in the form of log files, which is definitely one of the more likely suspects for growing disk space usage, things like container images, bugs, you have code that’s writing temporary files and maybe it’s doing image manipulations or is creating PDF files and then just not cleaning up after itself, can lead to this. It is something that we have to deal with when you’re running production loads in the cloud. You do run into this where all of a sudden something’s not working and that’s because there’s just no more disk space available. So, how do we even get to that point?

Jon: I’m curious, Rich. You run production workloads for WordPress with WP Engine and I’m just curious whether this is ever an issue over there or whether WP Engine takes care of this side of the world and you just never have seen this problem in your feelings and dealings with WordPress. Just curious.

Rich: We have a limit. I forget, it’s like a 100 gigabyte limit or something like that on our server. We do sometimes get within 20% of that. When you get within 20% of that, WP Engine [00:05:55] you to clone into another environment and you need to free up resources or upgrade.

Jon: It could happen where you could go down if you weren’t cognizant, but they take care of the monitoring for you.

Rich: The way that they handle is they just increase it for you and then if it happens another month, they charge you. I don’t know how they actually do it now, but I think I was one of the first 100 customers and back then they just kept like loading it up until they could have a conversation with you about it. But yeah, I think theoretically, if you go over, you’re in trouble and for me, that would be big trouble because we have 60 sites that are on the same box, most of them don’t do much traffic so it’s only a few of them that would be causing that problem.

Jon: Interesting. I was just curious because it’s a managed service, and Rich is friends with the team over there so he was very politic in his response which I thought was good. But going back to AWS and the wild west of deploying whatever in the cloud, disk space comes up a lot. 

Chris: As we talked about in the past, we were very much into containerization. We use ECS quite heavily for running our apps and that has its own nuances with the space requirements. It’s not just the applications that are using up disk space to do things like writing log files. It’s also a function of the number of different types of services that you’re running, because each service, each task that you’re running that’s backed by a Docker image, those are cached on these hosts. They need to be there on each host that they’re running on.

As you grow and you have more services that you’re deploying, your disk space requirements are going to be going up. The likelihood that you run into this problem of running out of disk space becomes even more likely to happen. So, very much a very real-world problem that you’re going to run into in production.

Maybe we can start off by talking as far as AWS goes, the standard there for just monitoring stuff is CloudWatch. That’s Amazon AWS’ metrics and monitoring platform. Out of the gate, when you do have an EC2 instance, you’re going to get some standard CloudWatch metrics for free. This is the ‘batteries included’ part of it. You’re going to get things like CPU utilization, disk throughput, network throughput, and status check. Those are the standard CloudWatch metrics that you get for free. When you spin up that EC2, those metrics are now going to be available to you and you can do whatever you need to on them. You can monitor them.

Jon: Did you just say disk throughput?

Chris: Yes, and by that I mean this is basically the number of reads and write operations you’re doing as well as the bytes that are being transferred for read and write. It’s the traffic.

Jon: I’m sorry. I was developing a theory of maybe why disk utilization wasn’t a standard metric and you just crushed my theory because if they’re doing disk throughput, they’re obviously capable of knowing what disks are mounted and what they’re doing. I was thinking, “Well, it’s a little heavier lifting. You can have lots of disks attached to a machine and maybe some of them are even network mounted or whatever.” Since it’s such a configurable part of a machine, what disks it’s using, maybe they’re like, “Yeah, let’s not go into that muddy pool and let users deal with that.” But if they’re doing disk throughput, they’re in the muddy pool. They’re standing there deep in it already like, “Come on, you’re going to tell me how many bytes are going through this thing? You’re not going to give me a warning when df gets up to close to 100% on these things? Come on.”

Chris: It’s definitely at a higher level of abstraction so you think of this as more of the device driver level. This plays into the VM itself and the hypervisor.

Jon: But is it always? I mean if you type df, you get a response that has to do with the drivers that are connected to disks and what’s mounted. It’s a trustworthy operation. df is not going to tell you about a disk that’s on another machine. It’s not going to tell you about a disk that’s not mounted. Frankly, I’m disappointed with AWS right now.

Chris: Again, the real-world practicality of this is one of those things that there is a technical reason for this because I guarantee you more than anything, AWS has heard this loud and clear many, many times. I think it is related to where the virtualization is happening at the level that this stuff is at. It doesn’t see devices, it doesn’t see mount points and stuff like that. Things are coming and going, but as that bubbles up through this virtualization stack, at some point, it knows. It’s like, “Hey, I’m going off and I’m talking to network,” or, “I’m going off and talking to disk,” and at that part, that’s where it can capture metrics. That’s why you can track packets that are going back and forth, if you will, but let’s underline that the actual architecture underneath that, it doesn’t see at that level.

Jon: Let me just say something and then you can say, “No, you’re wrong.” To get the answer, you basically have to run a user-level program on the machine, then AWS isn’t going to get you that answer because it wants to stay out of there, stay out of that user-level stuff. It only wants to be at sort of a hypervisor level. 

Chris: That is true. It’s not part of the OS, so if it is data that’s only available at the OS level, then that’s not going to be something that it can do.

Jon: All right, I get it. It makes sense with EC2, it makes a little less sense with ECS because with ECS, they are running an agent. You have to let them run a program on the machine so it makes less sense there, I would say.

Chris: ECS gives you a bit more stuff out of the box for free. We talked about disk space, this is one of the things the standard metrics, you’re not going to get that. You’ll get throughput but not an actual space.

Then the other thing that’s like noticeably missing is memory. That’s another one that you’re just not going to get without custom metrics. On the ECS side, ECS agent will publish metrics about your memory utilization because that’s one of the constraints that you have when running your containers on the services. It’s saying like, “Hey, what’s my memory limit? How much memory can I use? What’s the max you can use?” Then there’s metrics there for saying what’s your utilization, how close are you to that limit type thing so you can then do some tuning. There is some additional stuff that you do get with these more services that are built on top of it. But ECS agents run at a different level than in EC2.

Jon: I guess it’s making sense. It’s the same thing as it was with disk usage. In order to know how much memory you’re using, you have to be in there, inside the operating system.

Chris: That’s CloudWatch. CloudWatch has a very robust architecture for having these metrics being captured for retaining them, for visualizing them, for graphing them. It also has the ability to monitor these things and set alarms. These are just core key things that we use all the time. So many things in AWS are built on top of CloudWatch metrics and the associate alarms like Auto Scale Groups, that’s how they work. They’re monitoring CPU utilization or some other metric and using that on deciding when to trigger alarm to scale type thing.

Given that, we’ve talked about it’s not part of the standard metrics. So now, we’ve got to start talking about custom metrics because that’s what’s going to get us to where we need to be. At the end, what we really want to know is like, “Hey, are we bumping up getting low on disk space? If we are, sound the alarm, so that we can go do something proactively before we do run out of disk space.” That’s what we’re trying to do here. In order to do that, we need to create some custom CloudWatch metrics around disk space. We need a mechanism for doing that.

If you go do a Google search or what not to go start doing some research into how do you do this, you will find that AWS has a recommended way of doing this. They put together a collection of Perl scripts for doing these various OS-level metrics, metrics around memory and disk space and whatnot. You can find the pages that walk you through all the steps that are needed.

Basically, you have to manually install this stuff. You first have to install Perl in its environment and all the dependencies that are required on that, then you go and basically download the Perl script package. Once you do that, then you can actually start using it. Perfectly viable way of doing it for me, when I was doing this for our ECS clusters. To me, that seemed really heavy-handed and I didn’t want to have to install Perl and these dependencies just for disk space metric. That’s all I really wanted.

Jon: It’s a little surprising that that’s still the recommendation given what you’re about to say. Go ahead and say what you’re about to say.

Chris: You alluded this before. You want to see what disk space is. Just run df and that tells you for each one of the mounts that you have, it gives you all of the stuff that you need like total capacity, how much is available and how much is being used. This is what df tells you. It’s literally a single line of Bash.

You run df, you pipe it through tail to chop, basically, get rid of the header columns because you don’t need that and then run it through awk to say, “Give me the fourth column,” because that’s the disk available number. Just with that one line of Bash, you now have your free disk space, your available disk space.

Jon: We’ll drop that line into the show notes. I think that’s a nice little gem right there. It’s a well-written Bash script command.

Chris: Yeah. Instead of going through all the hurdles and the hoops of getting this stuff installed and putting software on your instance that you really don’t need, just keep it simple. It also really makes it very obvious what it’s doing. It makes it easier for people to understand what’s going on as opposed to all this other stuff of installing a bunch of packages and then this black box of scripts that are being downloaded.

Jon: Right, and it’s also a little surprising to me because doesn’t AWS Linux come with pre-installed scripting languages? Doesn’t it already have, say, Python on the machine with no installation necessary?

Chris: It depends on what AMI you’re using. But yeah, there’s a range of stuff there.

Jon: Yeah. Everything’s going to at least have shell and probably Bash.

Chris: Yeah, unless you’re Alpine and then [inaudible 00:18:27] it will still do the stuff. Everything will have df. df and awk are like tail. This stuff is going to be there. That’s all we really need to do to get our disk space metric. Now we just need to create this custom metric and publish it to CloudWatch. For that, we really just have to make some decisions on how we’re going to name it. In this case, it’s pretty easy. We can pick a name like Available Disk Space. That’s our metric name.

Then we decide on what the dimensions are. The dimensions are the way that you are specifying what’s the level of granularity, like at what levels it this being measured? We definitely want to do this at the instance level because this is definitely an instance value.

Depending on your setup, you may want to have further groupings. These EC2s are hosts inside ECS clusters. For us, it makes sense to set this metric to be done at the instance level along with the ECS cluster level. If I’m looking at my metrics, I can say, “Oh, I’m going to go look at all the metrics in the development cluster of ECS,” or, “I want to see the metrics in the staging cluster, the production cluster. For each one of the instances that are in that cluster, now I can inspect those even further.”

It’s a way of grouping these dimensions on it. It’s just going to be up to you what makes sense. Maybe if you don’t have clusters, it may just be per instance, you could do it per scale group or something like that. Just something for you to think about and make a decision on.

Jon: It makes sense.

Chris: That’s really all we need to do for creating our custom metric, getting that data, and publishing it. The next part is like, “Okay, we want to have an alarm that tells us when we need to be aware of this.”

Jon: Before you go into alarms, there is one thing that I don’t quite understand. We have this little line of Bash that can run and get information out, is that something that we tell CloudWatch to run or is there a place where you go paste this? That’s what I’m curious about. Or is there something you set up in the computer and if you dump the thing right into this place, CloudWatch knows how to find it?

Chris: To publish your metric to CloudWatch, it’s an API call or a CLI call, and you’re basically just saying, “Okay, here’s the name of the metric, here’s the dimensions, and here is its value.” What that Bash line gives us, it gives us a number. It’s the free disk space and we will follow that up, now we can make our API call or a CLI call to CloudWatch. For the disk space avail metric for this instance ID and for this ECS cluster, here’s its value.

Jon: Okay, and I see we are going to get to that later. I didn’t before, now I do. Let’s talk about alarms and then come back and put it all together.

Chris: Alarms, some key components there is we got to figure out a good name for them. Name them in a way that allows us to organize them and find them. We’re going to specify what metric we want to monitor and then we’re also going to specify the dimensions again like what’s the level of granularity for monitoring on this. We’re going to define a conditional expression in this alarm. That’s going to be composed of a threshold or a value and a comparison operator. This is going to be if the available disk space is less than five gigabytes, then that’s going to trigger the conditional or it could be if CPU utilization is greater than 80%, that’s going to trigger the conditional. That’s a key part of the alarm.

Then we’re also going to specify an action. What should happen if this conditional evaluates to true? Typically, we’ll probably post to an SNS topic and then from there, additional things can happen. For our particular case for disk space, we probably going to want to have an email sent, we could have something published to Slack, or whatever like that. But just keeping it easy, we’ll just say like, “Hey, this is going to trigger an email whenever the available free disk space is below, say, five gigabytes.” There’ll be an SNS topic we can create and whoever wants to subscribe to that can with their email address. When this alarm gets triggered, there’s going to be this email that’s sent to everyone on that letting them know that, “Hey, this alarm has been triggered. Here’s the value and you should do something about this.”

Jon: CloudWatch really works. I have a quick story to tell you, Chris. I think it was three or four years ago, I was playing around with some AI stuff and I decided I was going to try to make a data set based on Twitter and I wanted to get tweets but only during when the market was open because tweets outside the market being open were irrelevant to what I was trying to calculate. I created a CloudWatch event or a CloudWatch alarm that would go and look to see if the market was open and then send me an email when it opened and send me another one when it closed. It would kick off other things but I would know that it was working. I still get those things every day.

Chris: Unsubscribe. Actually, when you think about it from a distant infrastructure standpoint, this is pretty amazing engineering, what they’ve done with CloudWatch, and just metrics, the graphing, and just the alarming. Just being able to monitor all this stuff, to trigger alarms, and to be so reliable, that is quite the engineering feat. Tip of the hat to you, AWS.

Jon: A little bit schizophrenic on how we feel about AWS during this episode.

Chris: There’s a little bit of carrot and a little bit of stick. Those are the key components of a CloudWatch alarm and again, you can create these alarms via API calls, via CLI calls, SDKs, there’s all this just the normal way of being able to automate this stuff into it.

To put this all together, we’re running ECS clusters and each one of those clusters is backed by a launch configuration and a scale group. What we really need to do is we want to do two things during our host initialization. One is we want to schedule a cron job that’s going to go periodically determine how much disk space we have and then publish that to CloudWatch as a custom metric. That’s the one thing that we need to do.

The second thing is just to go create a brand new CloudWatch alarm for that particular instances, disk space metric that it’s going to be publishing and then we’re going to have whatever the threshold may be. We’ll just call like five gigabytes. Those are just the two things we have to do during host initialization. After that, we’re done. We’re going to use the user data script as a way of doing this. It’s a Bash script that you have that’s going to be run during the process of EC2 spinning up that instance, is where you get to run whatever kind of custom code you want that’s run once whenever that instance initializes. Then we’ll be on our way.

Jon: The thing about that, maybe, I wonder if there’s another way because I think of CloudWatch as the cron of the cloud and CloudWatch itself can run things on a schedule. It would be neat if you could just say, “Hey, for anything that’s in this Auto Scale Group or this ECS cluster, just reach in there, run this, and then get the output,” so that CloudWatch could be the cron instead of having to do any scripting at all. It would be neat.

Chris: Yeah. It gets a little bit more complicated.

Jon: Because probably some more heavy lifting around, I am [00:27:21] as an access to the machines and stuff like that went stuck, but it would be cool if you could do that.

Chris: And just determine membership set and how you actually specify the different types of membership sets that there could be, whether it’s an ASG, or something else, or is it a load balancer. I think it would end up how do you actually define that, feels like it would be pretty challenging and complicated.

Jon: Maybe worth it there, because if you have a lot of machines out there, one of the things that I feel that’s true about cron is it’s where things go and happen in the dark, cron is not really letting you know what it’s doing and it’s got a secret little log file that you have to go find and see when things broke. It’s a little pain in the butt.

Chris: It’s like serverless in a way. I mean, had the same problem with Lambda, especially scheduled Lambdas. Scheduled Lambdas are the cron of a cloud. I totally agree when you’re dealing with cron jobs, it’s much less visible. It really is like invisible code.

Jon: Here’s another good reason to have a CloudWatch event go run the thing and then get the information back. The CloudWatch event goes, reaches in the machine, it says, “Do this,” and then the thing comes back and says, “Here’s my result.” The reason is if cron stops working for whatever reason, maybe it’s out of disk space so it can’t do the thing that it’s supposed to do. It’s not going to tell anybody. It’s just going to be a lack of information and with any kind of alerting, alarms, or monitoring, it’s like the lack of thing that happens that’s often the hardest thing to monitor.

At least if CloudWatch was reaching in and saying, “Run this little df command and give me the result,” then if it wasn’t able to do that, it might be like, “Hey, hey, everybody, I wasn’t able to do this thing I’m supposed to do all the time,” whereas if cron can’t see, it’ll be like, “I secretly am putting a little failure message into a secret file that nobody does not [00:29:25].”

Chris: This is the way that we’ve set it up. Everything’s running on the instance itself using cron to calculate every call every five minutes and then publish metrics to CloudWatch. Another way you totally could do this right is create a Lambda. Have the Lambda schedule it so that it’s running once every five minutes, have it go out. It can make the API calls. It can configure to what cluster you want to go look at. It can do the API calls to get the membership set of what instances are running inside that. It can then make API calls to your instances themselves. You can expose some hook there for it to report back, “Here’s the metrics,” and then your Lambda can then publish that to CloudWatch. You can use Lambda as your cron type thing.

Jon: Yeah, it gets into those trade-offs, like the development team is like, “Yeah, we were totally going to write that feature that lets users upload this thing to their profile but we spent the last few weeks building a disk-free utility.”

Chris: Indeed. At the end of the day, it ends up being really that simple. We started on saying like, “Hey, this is really unfortunate that you have to do some lifting here to get this done,” but I think the important point is that it’s really not that much. We’ve talked about this one-liner of Bash using df to get the free space, just do an AWS CLI call for putting that metric to CloudWatch, to publish it using cron, to do that every X number of minutes, every five minutes. Then also just wire up a new alarm based upon that metric. Now with those two small tweaks to our user data script, we’re now going to get an alarm triggered whenever we go below that disk space.

Jon: Then the next thing we do is to hook up another little script that goes and grabs you a little more disk space and sends you an email saying, “Hey, I just got you little more disk space just so you know.”

Chris: Yeah, and that boils down how you set up your EC2 instances. Is it the root? Is it the OS volume? Is it [00:31:55]? All that kind of good stuff. Just getting the email notification, that’s definitely a good step in the right direction.

Jon: For sure.

Chris: Maybe some practical things to consider about this. Again, we treat our EC2s as cattle, not pets. These things are coming and going, they’re terminating all the time, new ones are spinning up, just there’s that churn. We get this question of, all these metrics and alarms are per instance. They really are going to be short-lived, hopefully. They’re definitely going to be lasting for months and months and months. It’s more on the order of days or weeks. Is there a clean up that’s involved here?

Something that’s surprising, or maybe not surprising when you think about it, but you cannot delete a CloudWatch metric. Once it’s been published, you can’t delete it. Just not an option. Also, the retention period now for CloudWatch metrics is 15 months, so it’s 455 days. We create an instance, we published some metrics for it, the instances terminated, those metrics will be there for 15 months. That’s not so great given that we have this flux of things coming and going.

Jon: Yeah, don’t put personal information in your CloudWatch metrics is what I’m hearing.

Chris: Absolutely because you can’t scrub it. It’s there forever or for 455 days. But with CloudWatch alarms, you can delete those. I would say the recommendation here is don’t worry about the CloudWatch metrics for your terminated instances because really, you shouldn’t be looking at those directly. We really only need the metrics to wire up the alarms. It’s really the alarms that we care about.

The only other piece here that we have to do is we just have to periodically reap any CloudWatch alarms that are out there for instances that no longer exist. That should be the proper amount of cleanup for us. That can be, again, create a Lambda job that runs once every day or something like that and just goes and queries for the instances that are active and looks at our CloudWatch. When you name these things, you should definitely have a pattern so you can identify what they are.

For me, for my CloudWatch alarms, I chose a naming scheme where it was like ECS cluster name-metric name-instance ID. It’s composed of three different variables, these names, so it makes it really easy for me to figure out what the instance ID is that this alarm is associated with. To do that reaping becomes pretty straightforward.

Jon: Makes sense.

Chris: Yeah. So now no more excuses. Go monitor your disk space and get alarmed.

Jon: Excellent. That was super interesting. I don’t know if it never stops being a problem until we just have auto-disk expansion capabilities. Never run out of space. Just automagically make me more space and then maybe this problem will eventually go away.

Chris: Indeed. Of course, you’ll pay for it.

Jon: Yes, you will. Thanks so much, Chris. Thanks, Rich.

Chris: Thanks, guys.

Jon: Talk to you next week.

Chris: All right, see you.

Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/67. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

Show Buttons
Hide Buttons
>