29. VPC on AWS (Part 3)
Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache conclude their series on how to set up your virtual private Cloud (VPC) to run elastic container service (ECS) workloads. In the first two episodes of this series, they discussed the fundamental and foundational groundwork for this process. So, now that you have those basics in place, it’s super simple to understand how to run ECS workloads in a VPC.
Some of the highlights of the show include:
- Host-based ECS involves defining clusters – sets of EC2 hosts that run workloads
- Types of ECS: You host the ECS where you set up, populate, and manage clusters, or someone else does (ie AWS Fargate)
- Put all machines inside ECS clusters on private subnets to not be accessible from the outside world or have public IPs, but be internal to VPC only
- Let clients on the open Internet communicate via elastic load balancers or API gateway
- When using a private subnet, create a jumpbox or set up a VPN tunnel to access and manage clusters
- Many pieces are involved in this process; it’s complex enough that mistakes would be easy to make
- If applications are a target of hacking or have sensitive data, they need to be heavily responsible for managing their own security
- Use security groups to set ingress and egress rules that only allow ELBs to communicate to and from ECS clusters
Links and Resources
Rich: In episode 29 of Mobycast, we conclude our series on how to setup your VPC to run ECS workloads. Welcome of Mobycast, a weekly conversation about containerization, Docker, and modern software deployment. Let’s jump right in.
Jon: Hello and welcome, Chris, Rich. Another Mobycast.
Rich: Hey.
Chris: Hey, guys.
Jon: Rich, what have you been doing this week?
Rich: We had a blog post that got picked up by one of the larger UI/UX newsletters called Sidebar and that’s driven a ton of traffic. It had been going on sort of that rabbit hole, seeing how far it’s reached. I’m trying to figure out how I can make that happen more often because it gave us a 4000% increase in traffic in the last two days. I’d love to see that be sort of consistent. Although, it’s probably not possible.
Jon: That’s exciting. Congratulations.
Rich: Yeah. Thank you.
Jon: Chris, what have you been doing?
Chris: This has been back-to-school week. It’s particularly a milestone for us now that I have two children in high school and my wife is a school teacher. Tuesday was a big day. We’re all adjusting to the new schedule and getting the […] along with it.
Jon: Same here. Back-to-school week for me, too, and our three year old son is starting preschool this year. Big changes.
Alright. I imagine that there may not be that many people that have hung on with us. We’re on part three now of a fairly detailed technical conversation, whose goal in the outset, we were ambitious in the first episode. We’re ambitious, thinking that we are going to get talking about how to create VPCs, specifically for running your ECS workloads.
By the end of episode one, we had covered some fundamentals about what VPCs were, availability zones, and things like that. And then in the second episode, we still didn’t even get to ECS. We were talking about just more foundational stuff about VPCs in private and public subnets and some security considerations. Now, I think, we’ve gotten far enough to the point where you, you one person maybe, that has listened to the first two episodes and understands the foundational stuff that we’ve talked about and how VPCs work, you’re going to be ready for this transition to talking about what we should do when we’re building ECS, we’re creating workloads for ECS, how we should setup those VPCs.
Chris, did I summarized it right? Did you want to add anything more to the summary, where we’ve been, and what we’re going to do today?
Chris: No, I think you stated it well. Basically, we covered a lot of foundational groundwork that will really set the stage now with a good understanding of that, all set the stage for how do you actually run your ECS workloads in a VPC in a best practice way. It’s actually super simple. Once you have those basics in place and some things of theirs, there’s not a lot to it. Hopefully, the homework we’ve done the last two episodes will pay off here and this will be a pretty straightforward discussion of how to get ECS running inside your VPC.
Jon: Great. What is this super simple thing that we have to do?
Chris: Maybe just to refresh a little bit. We talked about inside your VPC the concept of public subnets and private subnets. That’s an important point to remember here. Another consideration that we have when we setup ECS is what needs to talk to these services that will be running on ECS. Do they need to be accessed from the open internet or are they something that’s just within the VPC only?
Some good examples of this would be, let’s say you have some clients, whether they be a web client or maybe a mobile app client, those are going to be outside your VPC. They’re out running on the open internet. Those obviously will have to come into some front door that’s publicly accessible. Whatever resources that are responsible for establishing those connections will need to be on a public subnet.
Conversely, you may have some microservices that provide foundational components to the rest of your architecture and those may not be directly hit by outside components. They may only be consumed by your back-end services themselves.
Jon: And that was a bit of a mouthful, these foundational microservices that serve up stuff that you only need to see inside private subnets. Can you give me an example of what one of those might be? What information it might serve up?
Chris: Maybe an example would be, let’s say you have a photo sharing app on your mobile phone. You have maybe a front-end API service that’s providing the frontline communication with that mobile app. Things like, “Oh, I’m uploading a photo.” or, “I want to go get the high resolution version of this photo or I want to see all photos that have been tagged with ‘cat,’” or something like that. That’s going to be out on the open internet.
Maybe you have multiple applications across all your projects that have this basic fundamental need of saying like, “I need to store an image somewhere inside the Amazon, inside the cloud, and I need to retrieve it. I need to do things like compression or there’s metadata information I want to track.” but it’s kind of like a common thing. This whole idea of storing an image, retrieving an image, it may not just be used by that application but by other applications that you may have.
Maybe the way that you architect is you create another microservice, that all it does is it just deals with images. It determines how they get stored efficiently and how they get retrieved efficiently, it does all the security considerations associated with that, and that just maybe a microservice that you have. That would be something that’s probably within the VPC only. It would be basically protected on the private networks because it’s only going to be consumed by your other services.
In our case, it would be that API service that we have that the mobile clients are talking to in order for it to handle the upload request that comes from the mobile app, it may very well defer some of that processing to this back-end helper microservice that knows everything about images.
Jon: Okay, that makes sense to me. We have examples and hopefully just conceptually understand the difference between services that would live in a public subnet and ones that would live only in a private subnet.
Chris: We’ve talked about before with ECS, how ECS is composed. When you’re running host-based ECS, you define clusters. Clusters are sets of EC2 host that can run your workloads on them. That’s what we’ll talk about here.
Rich: Hey, this is Rich. Please pardon this quick interruption. We recently passed an internal milestone of 10,000 listens, and I want to take a moment to thank you for the support. I was also hoping to encourage you to head on over to iTunes and leave us a review or a rating. Positive feedback and constructive criticism are both incredibly important to us. Give us an idea of how we’re doing and we’ll promise to keep publishing new episodes every week. Alright, let’s dive back in.
Jon: Are you going to do the thing I was hoping you would do, which is you just said host-based ECS, and for some reason, I don’t remember using that specific term before and what host as opposed to what other types of ECS are there.
Chris: Relatively recent development from AWS is Fargate. This is basically where you don’t have to manage the host anymore to run your ECS workloads. You gave it a serverless ECS where basically AWS is managing your host for you. We’re specifically talking about host-based ECS where you’re setting up your cluster and you’re populating that cluster with EC2 machines.
As part of that, we need to decide to these EC2 host machines that make up a cluster where do they reside, what do they look like, and how do they get created. We’ll have launch configurations associated with this, a launch configuration to dictate what those EC2s look like, where they get placed, and then you’ll also have an autoscale group definition for this is where the control like how many of these nodes you have and when do you scale up, when do you scale down, and what not.
One of the best practices with ECS from a networking perspective is that regardless of whether or not the service is providing capabilities to things that are out on the open internet, there’s really no reason to have those EC2 machines on public subnets. Best practice would be to just make sure all of your machines inside your ECS clusters go ahead, put them on private subnets so they’re not accessible from the outside world. They won’t have public IPs. They’re just blocked off the net. They’re internal to the VPC only. That’s one thing that you want to do.
Then how do you do that? How do you actually let these clients that are out on the open internet talk to these things? That’s all done through however you’re directing traffic to them, whether it be a load balancer or maybe it’s API gateway. In our case, we definitely a lot of times will use load balancers, and that’s our preferred choice for our architecture how we set it up.
For those services that need to have clients out in the open internet, we’ll create front-end ELBs for them and those ELBs will be on public subnets. But the ELB can surely talk to the private subnet. ELB is a proxy and that’s the only thing that’s on the open internet. You can lock down what ports it communicates on and that accepts ingress traffic. Then you can also lock down what goes out of it to your back-end machines as well.
That’s something to keep in mind is that, for your public-facing services front them with an ELB that’s on a public subnet. Conversely, for those services that are to be consumed privately, so your foundational microservices, I feel like we talked about, we’ve talked about that maybe having an end-of -service, front those with an ELB that’s on a private subnet. The only things that can actually access that service through that ELB would be machines that are actually within the VPC itself. We have that locked down secure.
Jon: That makes sense. I follow what you’re saying in terms of getting an API access either inside the private subnet or publicly. But I guess one question that occurs to me is, sometimes the access you want to these machines is not just API access. Sometimes you want to actually log in and see what’s happening on a machine that maybe has several containers running on it. One of these EC2 instances may have several tasks which are Docker containers running. If you’re always putting your ECS cluster in a private subnet, how can you get in there and see stats in there?
Chris: You have a couple of choices there. One thing you can certainly do that’s pretty straightforward and probably the easiest thing to do which is create a jump box. You would have a single—
Jon: Is that on something as […] server?
Chris: Yup. You have a single machine that allows SSH access and you lock it down to specific IP ranges for source. Maybe from your office, you know what your IP range is going to be so you lock it down so that only origination IPs from that range can actually access that jump box. Once you’re now in the jump box, you’re now within the VPC. From there, you can SSH into one of the machines that’s on a private subnet. That’s one approach. Again, pretty easy to set up but also it does requires making sure you have set it correctly. It’s pretty easy to also screw it up and to make it too wide open, so you deal with that.
The preferred way of doing this is to set up a VPN tunnel between you and your Amazon VPC. Lots of different products and services out there for creating these VPN tunnels. There’s some stuff out there that’s free, like SoftEther. There’s other ones that are relatively low cost like OpenVPN. There’s tens of other products out there from other big network and vendors on the Amazon AWS marketplace and what not.
Lots of options out there for doing it, from free to something that’s paid and supported. But again, pretty straightforward to get that setup. I would definitely recommend that setup a VPN tunnel for your VPC. Then that way, when you want to access one of these machines that’s on a private subnet, just establish a VPN connection, and once you have that connection in place, your within the VPC itself so you can just SSH directly from your machine into that using the private IP address.
Jon: Something that occurs to me thinking about this is that there’s a lot of pieces to this. There’s setting up the VPC with its subnets and their security rules, and there’s attaching internet gateways, there’s potentially having a Bash end server, creating a VPN. There’s this whole idea that we’re doing some best practices by running our ECS workloads inside a private subnet.
Despite all of this, despite the fact that we’re doing these best practices, it does feel complex enough to me that mistakes will be quite easy to make even by very well-intentioned, very good, organized people. I guess, the kind of takeaway that’s in my mind is that the applications, if they have stuff that is a target of hacking or has sensitive data and information in it or personally identifiable information, not only do you need to do this best practices in terms of your setup of your AWS Dash and your ECS workloads, but really your applications need to be heavily responsible for managing their own security and securing their own stuff.
That’s like good authentication, authorization, all that stuff needs to be super, super good across all your services, applications, etc., within ECS because it just feels really very complex. This whole orchestration of network and components inside AWS is not enough. It’s the takeaway I’m feeling.
Chris: Yes, security is an onion with many layers, for sure, so don’t rely on one layer. Have backups in case there’s a breach at one layer, there’s another layer that’s protecting it.
Jon: One of the easiest things in the world would be, “Oh, I was away from the machine that had a VPN and I needed to go and do something on the ECS cluster, so I set up a quick Bash end server and I forgot to narrow down just the IP address that I was coming in there from and I left it up to the world but I was going to shut it down and I forgot,” and oops, now suddenly you have a wide-open VPC.
Chris: This is absolutely the stuff that happens every single day. These are the things that you read about in the news, that there’s been a breach, and it happens to companies small and large. It’s one of those things where most of the time I think we all know what the right thing is to do, but that ends up being too cumbersome.
Jon: It’s hardly ever the expedient thing.
Chris: That’s a really good litmus test of how secure things are is the convenience factor that you would do things. If something is really easy, simple, and convenient, it’s probably a smell that says, “It’s not very secure.”
Jon: Alright, that’s interesting.
Chris: If you got to do a few steps before you can make that connection and that’s probably a good sign. Just another thing to round this out which is best practices with setting up ECS on your VPC which is in addition to the being smart about private versus public subnets and putting your cluster EC2s on the private subnets only always, would be to also use security groups to your advantage, to further lock this down.
You can set the ingress and egress rules on your ELB to really make that fine level of detail, would recommend on these ELBs that are fronting your ECS services, create a security group rule that only allows those ELBs to talk to your ECS cluster. There is no reason for it to build or go to talk to anyone else. Conversely, set up a security group for your ECS host machines such that their ingress only allows connections from your ELBs. There’s no reason for it to accept port 80 traffic or port 443 traffic from any other machines. That’s the only way really a request should be routed into it. With a few simple rules, you can really lock this down and make it so that the service area is much smaller.
Jon: Super good idea. Great, I think that we finally turned the corner and talked about ECS and VPCs so we can pat ourselves on the back. Is there anything else before we sign out for today and say goodbye?
Chris: No, just thanks for sticking with us. Again, I think that covering the ground like we did in the last two episodes is that foundational stuff we just need to understand what makes up a VPC and all its components. But when you actually say, “Okay, I want to run something on that like ECS,” pretty straightforward and you can keep it really, really simple.
This is one of the benefits of containers is that you don’t have to have one machine per service. You have these pool of resources that are your cluster machines and you have end-services that you want to run, and you let something like ECS schedule them and run them. You just have to do a little bit of configuration and maintenance on those cluster resources and just make sure that you set us a couple of security groups, you set up a couple of private and public subnets, and you’ll be in a very, very good security posture, and a very efficient way for running your code.
Jon: Right. Thanks again, Chris. Thank you, Rich, as always for putting us together for us.
Rich: Yeah. Thank you.
Jon: Talk to you next week.
Chris: See you guys. Bye.
Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/29. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.