27. Setting up Virtual Private Clouds on AWS (Part 1)
Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache begin a new micro-series on setting up virtual private Clouds (VPCs) for ECS on Amazon Web Services (AWS). They discuss VPCs, sub-nets, availability zones, and other components.
Some of the highlights of the show include:
- VPC is the basic unit of isolation from a networking perspective of building out your Cloud-based infrastructure in AWS
- When creating an AWS account, a default VPC is set up; so when working with AWS, you’re always working in a private Cloud, not a public one with all AWS customers
- When creating a VPC, decide on the basic address space to use and how big it should be: What is the VPC going to be used for? How big of an address range do you need?
- Slice VPC into sub-networks; a sub-net can only exist in a single availability and cannot span availability zones; availability zone is like a data center with building blocks
- Availability zones are geographically disparate from each other; an availability zone could be spread over several buildings, all within a couple of miles within each other
- Availability zones can have infrequent outages; people panic because they didn’t think through all the implications of what it means for an availability zone to go down
- Pick a region – what’s convenient or where are most of your users
- Give access to the outside world to this networking via a network switch that you can plug cables into and hook up computers to it via the Internet
- Inbound Access: Things from the Internet being able to connect to those machines; set up an Internet gateway – a proxy server to connect with other computers inside your VPC
- Outbound Access: How machines make calls out to the Internet
- Routing tables are another component of every VPC setup; define what routing table each subnet should use; routing tables dictate allowed access to traffic patterns
- As a software developer, you need to have a fundamental understanding of TCP/IP networking; it’s always something that you’ll rely on throughout your career
Links and Resources:
Rich: In Episode 27 of Mobycast, we start a new micro-series discussing the setup of virtual private clouds on AWS. Welcome to Mobycast, a weekly conversation about containerization, Docker and modern software deployment. Let’s jump right in.
Jon: All right. Hello. Welcome, Rich and Chris. It’s another Mobycast.
Rich: Hey.
Chris: Hey, Jon. Hey, Rich.
Jon: Today, we’re going to talk about VPCs on AWS and getting set up for ECS but before we do that, what have you been up to this week, Chris?
Chris: Boy, I cannot believe I’ve now reached that point in life where I’m going on college visits with my son. This week, I started that process and just, again, kind of like scratching my head, wondering how we got here. Time has flown by and I’m getting ready to send one off to college.
Jon: Nice, and how did the college visit go?
Chris: It was great. It was an awesome experience. It was really cool to see his excitement and it definitely makes this idea of college much more tangible. I think it’s been a pretty abstract concept, but to actually walk around a college campus and see the classrooms and the labs and talk to the faculty, see the facilities, go visit the on-campus housing, it just made it all so real and pretty exciting and, like I said, making that just so much more tangible. It was very much well worth the time and this great experience.
Jon: Super fun. Very cool. How about you, Rich? What have you been up to?
Rich: Yesterday, we launched our largest product. It’s a science magazine. I think it has a couple of thousand uniques a day, which–so, even for traffic–
Jon: Are you talking about the Journal of Science, like the academic journal?
Rich: No, it’s nothing like that size. It’s still a culture dish.
Jon: When you said “science magazine”, I was like, “Whoa, that is hardcore. That’s awesome.”
Rich:”Thanks for taking the job a notch.” It was an awesome project. It was originally in WordPress and so you’d think that that would actually be an easy transition but, because we architected everything, it’s actually probably harder than having to just take it from a totally different CMS, but we had the right timeline, we had the right roadmap and it was actually the smoothest launch that we’ve ever had, which is interesting. Knock on wood. We still have today and the weekend to make sure we didn’t miss anything huge, but they’re heading into a conference and so it was a hard deadline. I’m pretty stoked on it and looking forward to getting some larger projects like that in the future, which is pretty good.
Jon: All right, that sounds good. As for me, we’ve been in hiring mode and we’ve had somebody start this week so I’ve been working with that person, ramping him up and getting them accustomed to the idea of working with Docker and AWS. Let’s talk about that stuff. It looks like our title today is How to Set Up Your AWS VPC for Running Your Containers on ECS. We’ve talked a lot about ECS over the course of this podcast. We’ve touched on VPCs a couple of times, and I think a lot of listeners that have experience with AWS will know what they are, but it never hurts to start with a definition. What’s the VPC, Chris?
Chris: VPC stands for Virtual Private Cloud and so it’s kind of the basic unit of isolation from a networking perspective of building out your cloud-based infrastructure in AWS. You can think of it as an isolated collection of networks on which to place your–to instantiate and run your resources.
Jon: That makes sense. It’s computers and networks between those computers that only you can see and your competitor cannot get in there and see that stuff.
Chris: Correct.
Jon: You have it in our list of notes that we’re going through today. The next thing we’re going to talk about is their primary components so what kinds of things make up a VPC.
Chris: Right. By default, when you create an Amazon account, an AWS account, to make it so that it’s ready to go right from the get-go, they’ll create a default VPC for you and so you already have that set up for you if you want to use it.
Jon: That’s just conceptually interesting. I just want to interject because that means that when you’re working with AWS, you’re always working in a VPC. There’s no such thing as the public cloud where you’re intermingling with all the other AWS customers; you’re always in your own private cloud kind of like me every day, walking around.
Chris: Indeed, yes. You’ll have that default VPC set up for you. I think a lot of people end up using that. For various reasons, you may very well decide to create separate VPCs from the ground up and then, later, you can go back and remove that default VPC if you will. When you create a VPC, you’re going to decide what is the basic address space that you want to use and how big you want that to be and so kind of some design decisions there for you to just understand what is this particular VPC going to be used for, how big of an address range do you need to use, and that will kind of be informed by how you want to slice it up.
You can slice it up into sub-networks. In short, it’s the sub-net, is what you’ll typically hear. You’ll slice up your VPC into these sub-nets, and one of the important characteristics of a sub-net is a sub-net can only exist in a single availability zone so a sub-net cannot span availability zones. At the very least, for availability reasons, you definitely want to have more than one sub-net defined inside your VPC just in case that one of those availability zones has a failure if it goes down.
Jon: I think by defining something with a new thing that we don’t know about, we may have lost me, we may have lost a couple of other people so can you just talk a little bit what an availability zone is so that it can make sense of why I need multiple sub-nets?
Chris: Sure. You can think of an availability zone as essentially like a datacenter. At the end of the day, we talked about the cloud and it’s like, “Oh, you can instantly provision things and spin things up and whatnot,” but, at the end of the day, these are very real computers that are sitting in racks and they have very real cables coming out of them and very real power requirements and cooling requirements and so there’s very much physical manifestations of this stuff and it has to sit somewhere. You group these things together into these building blocks, and availability zones are basically one way of describing those building blocks. You can think of it, again, as an availability zone is a datacenter.
Jon: I kind of like thinking of it that way because, just sort of intuitively, that makes sense. The datacenter is probably going to be online or offline like sort of if something kind of takes out the internet, it’ll take out the whole availability zone at once, the whole datacenter, so that kind of feels warm and fuzzy. I realize that it could be–who knows what it really is? It can be like a huge datacenter and each floor is its own availability zone. We don’t really know how they set things up, but we do know that there’s some physical proximity that the computers talk to each other when they’re inside an availability zone and so thinking of it as a building is helpful.
Chris: Yeah, and one thing we do know is that availability zones are definitely geographically-disparate from each other. You’d never have two availability zones in the same building and, actually, in practice, availability zones are not necessarily in the same physical location either. They can define an availability zone, and we say it’s a datacenter, but it actually could be spread over a building all within a couple of miles within each other or something like that.
Jon: Close enough to have super-fast networking among all the machines in it but far enough away from other availability zones that if something takes out the availability zone, it won’t take out the other one.
Chris: Correct. Right. Yeah, and these availability zones do have outages. They’re not supposed to but life happens. It’s infrequent but when it does, the internet completely lights up. People panic because that’s when a lot of these folks that perhaps didn’t really think through all the implications of what does it mean for an availability zone to go down like, “Is my application going to still run?” and then they find out, when it does go down, that, “No, it doesn’t. There were some things here we didn’t take care of.”
Jon: Then, the thing you said earlier was that sub-nets cannot cross availability zone boundaries. You also said you should have more than one sub-net in your VPC but the reason that I think you were saying that is because you were trying to suggest you should have multiple availability zones in your VPC. You could have two sub-nets in the same availability zone, but that’s not the point you were trying to get at; you were trying to say, “Let’s have at least a couple of sub-nets and one or two or more on each availability zone.” Did I get that right?
Chris: Yeah, absolutely. In that instance, definitely, the thought process was you want to have more than one sub-net just because of availability reasons. You want to be built on more than one AZ because AZs can go down so make sure you have that. There’s other reasons for having multiple sub-nets, and we’ll get into those reasons as well. Spreading yourself across multiple AZs is definitely one of those fundamental basic reasons for having multiple sub-nets.
Jon: Right on. I get it. Where do go from here?
Chris: Maybe we should check in with Rich. Rich, any questions on that?
Rich: I’m just trying to follow along.
Chris: Right, so we have our VPC, we’re now dividing it up in the sub-nets, and these sub-nets are slices of that address space. Again, you can kind of decide how big you want these sub-nets to be. Is it going to be space for a few machines or is it going to be a thousand machines or 4,000 machines, 4,000 IP addresses or whatnot? You can slice it up. You’re also going to define what region your VPC is in so maybe that’s something we should talk a little bit about as well.
We talked about availability zones and that corresponds roughly to a datacenter. A region is a collection of two or more availability zones that are, again, kind of geographically similar in the same geographic area, broadly. They’re not next to each other, necessarily, but they’re close enough and within hundreds of miles type of thing or less. A region would have multiple of these AZs within it. I think there’s multiple regions in each one of these markets. In the US, I believe there’s three regions now. In Europe, I believe there’s three regions. There’s Australia, there’s regions in Asia and the rest of the world and whatnot.
You’re going to define you want your VPC in. Again, most regions will have at least two AZs if not more, and so that’s something to consider when you are building out your VPC and how many sub-nets you have. Definitely, spreading yourself across at least 2 AZs is important but if your region supports more than two AZs, then that’s something very much to consider. It’s almost like, “Why wouldn’t you?” If your region has three AZs, why wouldn’t you go ahead and create sub-nets in each one of those AZs? Then, you can kind of think if you want to further protect yourself from failures, like what happens if two AZs go down? The odds of that happening are incredibly unlikely, but just, again, something to think about as you build this out.
Jon: The assumption I’m hearing that I just want to validate is that while we can cross AZs with our VPC, we cannot cross regions.
Chris: Correct.
Jon: Okay, that was my question. We pick a region. The region is usually just picked by what’s convenient or where most of our users are. If they’re mostly on the west coast, we might choose California or Oregon. If they’re mostly on the east coast, we might choose Ohio or Virginia. Then, after we’ve picked our region, where do we go from here?
Chris: After sub-nets, some of the other components that make up a VPC that you’ll need to think about will be things like how do you give access to the outside world to this networking. Up until this point, you can kind of think–really kind of abstractly, you can just think of this as you now have a network switch that you can plug a bunch of cables into and hook up computers to it, but that’s about it. Think of it like land or something like that. Now, how do these things actually go talk to the outside world? How do they make requests on Google API or how do they send email or whatnot? They need to have internet access.
Then, you have both sides of that. You have to think about it on things from the internet being able to connect to those machines, so the inbound access, and then you also have the, “How did the machine themselves make calls out to the internet?” so the outbound access. Both of those types of functions have different pieces that you’ll need to deal with to set it up. For that inbound access, you’ll want to set up something called an internet gateway. An internet gateway basically is a proxy server, and it is the entry point for other computers out on the open internet to go when they want to talk to those computers inside your private cloud, your VPC.
They’ll go through this proxy so it’s a special thing. It’s called an internet gateway, and that’s something that is incredibly easy to set up and to use with AWS. It’s, again, very easy in that it is a managed service provided by AWS so you don’t have to worry about availability. For the most part, you don’t have to worry about scalability because this is, again, a managed service that AWS provides that it’s built out with all these considerations.
Jon: That actually makes me want to ask just a really rudimentary question. Here we are making a VPC and, as far as I know, we haven’t even put any computers in our cloud. We just sort of defined it’s going to be in these availability zones in this region and here’s the IP address base that it’s going to take up. The next thing you were going to say is, “Here’s how you can talk to it from the outside world. Let me just attach an internet gateway through it,” but we haven’t actually put any EC2 instances into our VPC there; it’s just an empty network, right?
Chris: Yeah. You can think of it as like we’re building this big condominium building and so you start with the building itself to hold all these individual condominiums and you build out the individual condominiums and so each floor, you can think of them maybe as a sub-net. Then, you need things like, “How do you turn on the lights?” so you get a mixture that’s wired with electricity, and phone, and other utilities, and plumbing and whatnot. That’s kind of like what we’re talking about here. Then, those EC2s, that would be like when the residents actually come and move in.
Jon: Okay, I got it. The reason I asked is because an internet gateway, if you have gone networking before AWS world, it was a computer and you had to set it up and you had to put together routing tables on the thing. In this case, since it’s a managed service, it’s just like a conceptual internet gateway almost, like we’re just saying, “Hey, here’s how you get in,” and we’re attaching an “internet gateway” through our VPC.
Chris: Right. Yeah, it’s a managed service from AWS. You can set up a single internet gateway for your VPC and that’s all you need to really know that. I think the throughput on that is something like 10 gigabits a second. Again, it’s one of those things that’s super-nice in that there’s not a lot that you have to do. It’s so common. It’s definitely one of those things that the folks at AWS have focused on and optimized and just made it a lot easier than what it has been in the past.
Another thing that we can talk about is just the routing tables and whatnot. That is another component of every VPC setup, will be to define routing tables for, and your subnets will–you’ll define what routing table each subnet should be using, and that routing table kind of dictates what kind of access traffic patterns are allowed. If you want your subnet to be able to reach the internet and then also have the internet reach to it, then when you create that internet gateway, you can add the route to the internet gateway to that routing table.
That basically just says, “Okay, basically, this subnet, when it makes requests, in addition to being able to make requests to everything that’s within this sub-net, it can also make requests to the internet gateway itself.” It’s defining that hop so that you can say, “Okay, I’m going from my sub-net to the internet gateway,” and then, from there, it can now go to the outside world.
Jon: Has AWS created kind of a GUI to make it fairly intuitive how to do that or is it sort of the way it’s always been where you have to understand a UNIX configuration file type of thing?
Chris: This is configurable via the AWS console. Again, it’s pretty straightforward especially since the standard use case scenario is really just very, very–there’s not to much it because you’re just basically just going into a route table and saying, “Add this one route,” and you can say what the destination is and you’ll see the internet gateway will pop up as one of those options. Select that and then you’re done, and that’s all you have to really do and, now, internet access works on that particular routing table. Now, that routing table can be associated with more than one subnet so a subnet can’t be associated to more than one routing table but the converse is allowed.
Jon: Interesting, so if you have a couple of subnets to set up and you’ve set up your routing table and you really want the same routes applied to both subnets, you could just say, “Hey, use this one.” Got it. Definitely, there’s so much. Let me just take a pause here because we might actually have to finish this episode up pretty soon, too, but there’s so much of this that really depends on having a fundamental understanding of TCP/IP networking. When you hear words like ‘routing table’ and if you haven’t spent some time learning how TCP/IP networking works, then that’s pretty nuts, like, “What is that? What is it for?” and we don’t have time to really explain TCP/IP networking.
What we should say is that it’s kind of a fundamental thing to really understand as a software developer. If, for some reason in your education, you missed that piece, it’s worth spending a week really reading about it and coming to understand it because it’s always going to be useful. It’s always going to be something that you’ll rely on throughout your career.
Chris: That’s a great point and it’s just so true. I look back over the past 20-25 years and there’s been so many times where it’s like I had to go set up a firewall, and it was just out of necessity. It was just like, “Oh, I needed to have secure access between my home office and someone else’s home office and we wanted to make sure it was secure,” so we went and bought a Belkin switch that also had VPN capabilities inside of it. What does that mean to set it up? You go through the manual, you play with it and you’re just kind of forced to understand it at a level of, “Okay, what is a network switch doing, and ports, and forwarding rules? What is a VPN? How does that work? Tunneling?”
The amazing thing is, like you said, it’s still applicable. All these concepts, they have not changed. This is the fundamental way of how networking works, and that’s not changing until we go to photon packets or something like that. It’s like we’re quantum-computing or something, like this is the way it works. I’m super-glad and fortunate that I went through all those experiences because it does make things like figuring out how VPCs work and understanding things like firewalls, and gateways, and network address translation, which is another thing we’ll talk about later, like how all that stuff works. These are the same things that were the exact same way 15-20 years ago.
Jon: Right. I think that’s a great place to wind it up for this week and then we’ll just jump back in to talking more about setting up VPCs for your containers on ECS next week.
Chris: Awesome. Thanks, guys.
Jon: Thank you.
Chris: Later.
Jon: Bye, Rich.
Rich: Well, dear listener, you’ve made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode along with show notes and other valuable resources is available at Mobycast. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.