March 14, 2018

01. Virtual Machines vs. Containers

Show Notes
Transcription
Discussion

Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache discuss the differences between virtual machines (VMs) and containers. What are the pros? Cons? Similarities? Differences?

Highlights:

Chris, Jon, and Rich describe their backgrounds and how they came to use the Cloud and containers.
Many people understand that Docker is lighter weight than a VM – but that’s where their knowledge may end. They don’t know what that lightness means and why it’s important.
You can use Docker without a deeper level of understanding about how it works and how it’s different than a VM, but a detailed understanding helps you to know the pros and cons, and other considerations that are important, such as security.
VMs virtualize the entire machine, including the operating system and all hardware device drivers for access to network, storage, and I/O devices. They allow multiple VMs running on the same physical machine and potentially higher utilization of those physical hardware resources, like CPU cycles.
A VM’s hardware virtualization requires a Hypervisor – a layer that sits between the VM and the actual hardware. It’s like an “OS of OS’s” and it mediates access to all of the physical resources on the machine that may be shared between multiple VMs.
VMs have a performance penalty versus bare metal because all hardware interactions must go through the Hypervisor.
Modern CPUs have some built-in support for hypervisors. A virtual machine thinks it’s talking directly to hardware, but it’s actually talking to the hypervisor.
Hypervisors set up barriers to isolate different VMs from each other, for example by allocating a different memory space to each VM.
The only way you can have multiple operating systems running on a single piece of hardware is through a VM.
Containers are similar to a VM. They allow you to run multiple instances of software on the same physical hardware, and allow you to specify the operating system used for the container.
Containers virtualize the operating system, but NOT the hardware.
One drawback to VMs is that they replicate the hardware virtualization for every VM running on a single host machine. Containers share a single set of virtualized hardware for all containers running on the same machine.
Operating systems run applications, such as Microsoft Word. A containers is a application whose job is to behave like a whole computer with all of the features of an operating system inside that program.
The operating system allows applications (in this case, containers) to be isolated from each other. For example, the OS can allocate separate name spaces and address spaces to each application (container). Control groups (c groups) limit the resources that a single process (application) can use: CPU, memory, disk I/O, network.
The docker platform provides the services that allow individual containers to talk to hardware, without replicating the hardware drivers inside each container.
Containers allow more applications to run on the same physical machine than VMs do – meaning better hardware utilization.
Containers start & stop much faster than VMs because their virtualization footprint is much smaller: containers can start in seconds, versus many minutes for a VM. This mean higher productivity and less downtime for updates.
Container image files can be orders of magnitude smaller than VM images – megabytes instead of gigabytes. This makes it far easier to send containers across networks.
Containers do not require VMs. You can run containers on bare metal servers, which is one way to squeeze even better performance out of your application and achieve even better hardware resource utilization.
Performance: Containers impact the performance of running apps less so than VMs, but there still is an impact on pure CPU efficiency, maybe in the range of 20% versus a native application. (That 20% figure is a bit of a wild guess.)
For example, if the container is doing something that requires a lot of context switching, the container is making use of its operating system to figure out what instructions to run. The performance impact in this case will be worse than an application with less context switching.
Many apps, including most web applications, are I/O bound, not CPU bound, which means there will likely be no significant performance difference between a containerized app and the same app running natively on the OS.
Deciding between VMs and containers is influenced by multiple factors, including high-performance computing, scalability, and security. We’ll discuss security considerations in future episodes of the podcast.
Bottom line: VMs provide an improvement in hardware utilization and scalability of compute resources, and Containers take those advantages to the next level, with very few downsides.

Links and Resources:

Jon Christensen 00:00
So the roles here with this MobyCast would be that hopefully we’ll look to Chris to be the expert on things and explain them to me, and hopefully I can kind of take what Chris says and make it into stuff that’s digestible for Rich. And through that process, our hope is that people listening might be able to gain a little extra knowledge. So that’s what we’re doing here.

Jon Christensen 08:36
So today in this first one, like I said, we’re going to be talking about the difference between VMs, virtual machines, and containers. And this idea came up because we were sitting around and discussing what kinds of things we should talk about on this podcast, and we decided that one of the things that we hear from people when we talk about what is Docker, what are containers, is they say, “Oh, well, they’re like– they’re kind of like virtual machines, but they’re lighter weight.” And we hear this from everybody, from people that have very little technical experience, all the way through to very senior developers that are actively using Docker and major deployments with hundreds and even thousands of machines. And so our hypothesis is that that’s where people’s information and knowledge stops, that they kind of get this idea that Docker is a full operating system, but lighter weight than a virtual machine, but not specifically what makes it lighter weight and whether– like what’s that lightweight means and how it’s important. So I guess the first question – and I’ll ask this to you, Chris – is does it matter? I mean, people are using Docker and they’re using cloud services – Azure, and AWS, and Google Cloud – without really knowing the answer to this question. So does it matter if people know the answer, do you think it’s important at all?

Chris Hickman 10:08
So, yeah, I think there’s a couple of facets to the answer to that. So on one hand, if your goal is just like you’re drinking the Kool-Aid, everyone’s telling you that Docker is a great thing – and by and large, it is – you can go ahead and use the technology, and not really have that deeper level understanding of like, “Okay. Why is it better? What are the true benefits, why am I doing this?” And I think that’s actually the reality, right? We were talking about this where the common refrain is, “What’s the difference between virtual machines and containers?” And it’s like, “Oh, containers are lighter weight. And so therefore, they’re better.” And I don’t think most people really understand just why that is, and what it means, and what the consequences are. So having that deeper understanding, really kind of peeling that back a little bit and understanding the differences between them, it does have some great benefits. One, it allows you to really understand, “What are the gains I’m getting, what are these true benefits and how well are they going to apply to me in my situation?” And it also will kind of open up some opportunities for you to understand some of the more nuanced features or issues that will pop up, as you switch between virtual machines and containers. Kind of specifically around things like security, so you have some different security considerations that you have. There’s a bit more extra work that you have to do when you switch from containers. And kind of understanding what the difference is between virtual machines and containers, will help you understand a bit more why there are security issues and how you have to deal with that. So I think at the high-level, you don’t really have to go– it’s not necessarily to go super duper in this stuff, but I think it’s super helpful, right? And it’s going to make you just that much better at using the technology, and using it in a way that fits your need as best as possible.

Jon Christensen 12:23
Cool. I think that’s convincing enough. I think that I’m ready to find out more and really come to understand the difference. I’m curious from you, Rich, whether you have ever used a virtual machine directly. Have you ever installed your own virtual machine and played with it on a computer, that you can touch and feel with your hands? Or has your exposure been to virtual machines, mostly just been spinning up things like AWS instances in the cloud?

Rich Staats 12:53
Yeah. I had Parallels on my Mac when I was doing mostly SCO stuff, where those tools were only available on PCs. So I have my experiences more from that point of view with a virtual machine. I’m familiar with the context of what it is. It becomes less of a– I understand it much less once we start talking about it in a more complex, in the clouds-type scenarios.

Jon Christensen 13:21
Right. So, interestingly, you could kind of picture Docker as being, “Oh, I have Parallels on my machine, and I can install a bunch of OSs on there and have them all running at once.” But it’s somehow lighter weight than that.

Rich Staats 13:33
Yeah. Which is why I think I have a hard time wrapping my head around the difference. It’s that there’s so many similarities between the two.

Jon Christensen 13:40
Sure. Sure. Okay. Well, I guess let’s dive in a little bit. So the way I was thinking about doing this, Chris, is I was going to first set– I was going to look at each one. I was going to ask you to look at each one individually, like let’s talk about what a VM is. And then after we talked about that a little bit, let’s talk about what a container is. And then after that, let’s talk about how they’re different, but I’m open to a different approach. What do you think? Is that a good way to go, or did you have another way you wanted to talk about the differences between the two?

Chris Hickman 14:10
No. I think that’s a great approach. kind of like first just– I mean, it’s kind of hard to understand the differences if you don’t really understand [laughter] what the individual pieces are, right? So I think that’s a great place to start.

Jon Christensen 14:21
Okay. So let’s start with what’s a VM.

Chris Hickman 14:25
Sure. So virtual machines, at their essence, really what it is is it’s literally in the name, right? You’re virtualizing an entire machine. So this includes the hardware itself. The entire thing is being virtualized. And so what does this give you? By doing this, you now allow yourself to have multiples of these virtual machines running on the same physical machine, right? So it’s an abstraction layer that allows you to get better use of your resources. And so where this really took off when virtual machines first came out where people would go and have this very beefy hardware systems, these servers, lots of computing resources, very fast CPUs and lots of memory and storage space and what not. Then you go and install your operating system on that and you can support a certain number of users, but that machine ends up being kind of underutilized. And so by going to this technology of virtual machines, that it now allows you to basically have this compaction of resources that you can now – to your end users, to the rest of your team – it looks like they have three servers that they kind of connect to. When in reality, it’s just a single one. So I think the important thing to keep in mind with virtual machines is that it’s virtualization at the hardware level, you’re virtualizing the entire server itself.

Jon Christensen 16:12
You’re making me realize that we should maybe even back up one step from that, and talk about the machine itself and an OS itself. So we have a computer out there, it’s got some hardware on it and we want to put an operating system on it. And I think that when we put that operating system on it, we basically are saying, “Hey, here’s a disk, it’s got some machine code in it. Hardware, when you start up, go read from this machine code and this is going to be what tells you everything you need to do.” So that’s how operating systems get on the machine in the first place, right? Is there a better way to say that than what I just said?

Chris Hickman 16:54
No, I mean, at the end of the day, that’s exactly what it is. A computer is very much a physical thing. It’s composed of memory and CPU and disk drives, and a mice and keyboards and video monitors. These are all physical pieces of hardware, and there’s got to be some way that the computer itself can talk to these things. And so the operating system is the core– the brain, if you will, of the computer. It’s what kind of is providing that environment for interfacing with all these things. There’s these pieces of code inside the operating system called device drivers, and that is kind of low level code that bridges that interface between these hardware devices and the computer itself and the operating systems. So that you can have software that allows you to type on a keyboard and have those characters show up on your screen. There’s software–

Jon Christensen 18:12
And this is something that you probably can relate to, Rich, is that when a computer starts up – especially an old Windows computer, not so much anymore with Macs – but one of the first things you see on the screen is the screen is black and there’s some white text on there, it says bootstrapping. And what that really means is that the chip knows that it’s got to go out somewhere to a place and get some instructions that tell it, “From here on out, you’re listening to these instructions and you’re running these instructions. And everything you do, comes from this place.” And the bootstrapping is going and getting all those instructions and putting them in memory, so that forevermore from that point on, those are what control the behavior of the chip. And I think that kind of helps set the stage for how an operating system kind of gets into a machine, and runs it and owns it. And maybe from there, we can figure out what’s the difference between that and a virtual machine.

Rich Staats 19:13
Yeah. I think that does do a really good job of explaining, and I think that it’s actually probably pretty necessary for a lot of people who just make that assumption, but just don’t really have any experience with that low-level code.

Jon Christensen 19:26
Sure, sure. So before I continue, I want to ask Chris– because I’m never completely sure when I try to get a little bit technical. I’m never completely sure if I said things correctly. Chris, did I just characterize that in a way that seems fair?

Chris Hickman 19:45
Yeah. Absolutely. Probably even more technical, I think you’re talking more like BIOS and startup sequence. And, absolutely, like that’s when software starts coming online, it’s got to figure out like, “Okay, I need storage. How do I talk to that?” So there’s low-level code that has to get loaded, so that it knows to go talk to this disk and what protocol to talk to it. So I think, absolutely, the [inaudible] in that is that it’s complicated talking to these hardware devices from the actual machine itself. And it’s just a very fundamental part of what it means to be a computer, right? Your computer’s not going to do much if it can’t talk to storage or it can’t talk to input devices or output devices, so that communication with the hardware is super, super important.

Jon Christensen 20:40
Right. And it’s also really critical that when the hardware gets its kernel or it’s operating system, that I said sort of owns the hardware from that point forward, then it really is the case that you couldn’t run some program that says, “Okay. Now, I’m in charge. Forget everything you might have loaded up during your bootstrapping phase. I’m in charge of the computer now, and I can do whatever I want with this silicon.” So it’s really critical that that cannot happen.

Chris Hickman 21:09
Yeah. That’s called a virus [laughter] or malware.

Jon Christensen 21:12
Right. Right. Right. But I mean, even worse than that, because those things can run within an operating system. What I’m saying is that some other program cannot become the operating system. And so the transition I was trying to make was hardware virtualization. So I think– and, again, I’m kind of guessing here, but I think the way that that works is essentially the chip, the main CPU on board the computer, is able to say, “Hey, I can listen to multiple different operating systems. I can have multiple different operating systems telling me what to do. But when I do that, one of those operating systems has to be in charge. There has to be a main master operating system, and I’m always going to come back to that one to make sure that I’m doing what it says, first and foremost, and the other ones always get sort of relegated to second-tier status.” Does that sound like it’s probably accurate?

Chris Hickman 22:13
Yeah. I think basically we’re starting to get into some of the kind of the technical ways that virtual machines work. But in a way, you can kind of think of it as– we talked about how kind of like an operating system is responsible for allowing the computer to talk to the hardware. With virtual machines, you end up kind of having an operating system of operating systems. And that operating system, that meta-operating system, is actually– it’s called a hypervisor, and it provides that layer between the hardware and the operating systems that run on top of it. All right? So it’s–

Jon Christensen 22:50
Oh, yeah. It’s hypervisor.

Chris Hickman 22:51
So there’s only one thing really talking to the hardware, it’s the hypervisor. But it’s providing services that allow operating systems to believe that they have direct access to the hardware, when in actuality they don’t. They actually have direct access to the hypervisor. So it’s this broker. It’s this intermediary in between them. So the hardware itself, it doesn’t care, like it’s just things are talking to it. And CPUs, they’re not really kind of like the same thing, although there is certain features there that kind of give you better performance and more features with some virtualization capabilities built into the CPUs themselves. But for the most part, you can think of it that way. That it’s this layer that’s on top of it, the hypervisor. Virtual machines don’t exist without the hypervisor, right? That is really what a virtual machine comes from, is this creation of this hypervisor software that is managing the hardware itself, and then providing an interface via services to operating systems to hook into it.

Jon Christensen 23:57
I mean, I don’t want to test your knowledge of hypervisors, but I do want to understand them better. So do you know if they come from– are they on the chip? Are they in some memory that the chip has access to? Is Windows still on this– is Microsoft making the hypervisor? Where does the hypervisor come from?

Chris Hickman 24:22
Yeah. It’s very much platform-specific. Microsoft, they have a hypervisor. I believe it’s called the Windows Hypervisor. And that’s key because–

Jon Christensen 24:40
So when you install Windows directly onto a computer, it first puts a hypervisor on and then maybe even puts Windows on almost as like a VM. Like your very first Windows installation might actually be kind of a VM?

Chris Hickman 24:53
Probably not, for performance reasons. But–

Jon Christensen 24:59
That’s actually a little confusing to me that you just said probably not for performance reasons, because– I don’t know, I just was under this impression that VMs these days basically got to talk as directly as you could imagine to a chip, so that they didn’t really suffer performance hits.

Chris Hickman 25:16
Well, remember, there is that layer between them. So the hypervisor sits apart from this. So you’re absolutely making– like you can think of it as an extra hop. You’re going through some intermediary. So it will always be– you’re going to pay some performance penalty to make that bridge.

Jon Christensen 25:34
Okay, cool. Makes sense.

Chris Hickman 25:36
I’m not 100% sure on this. But I’m pretty certain that when you install Windows server, that that’s actually not being run through the hypervisor, that it’s actually talking directly to the hardware. But I could be wrong.

Jon Christensen 25:52
But it’s telling me that server does lay down a hypervisor as well, right?

Chris Hickman 25:55
Correct. Absolutely. And just like your Mac has its own hypervisor, so that comes with Mac OS. And likewise, the same thing with Linux.

Jon Christensen 26:09
I guess the thing that sort of still is a bit confusing to me is that somewhere around 2005 or ’06 – maybe even a little bit later than that – Intel chips and AMD chips started supporting the hardware-based virtualization. And what I imagined was that the chip itself was somehow able to sort of say, “Now I’m listening to you, operating system. Now I’m listening to you, operating system,” and that baked into the chip was some [on Silicon] way of being virtualized. And maybe we just need to research that a little bit.

Chris Hickman 26:49
Yeah, I think that’s true, that that’s kind of getting definitely a bit deeper. But that is necessary in order for a lot of these hypervisors to actually work, is they need some cooperation from the CPU itself. I’m pretty sure that that feature basically just allows you to basically namespace, the CPU and memory, right? So you’re actually putting in these barriers, and that’s necessary to do the virtualization. You say, “I’m going to run these three VMs and they can’t share the same address space, so I’m going to go ahead and–” so I can’t allow like the CPU instruction running in this one VM, shouldn’t co-mingle with the one in the other VM. So there’s some chip support there for providing that name spacing and that sandboxing. Again, you can think of it as–

Jon Christensen 27:50
But the chip is not providing the hypervisor. The chip is not doing [crosstalk].

Chris Hickman 27:54
No. No. Absolutely not. No.

Jon Christensen 27:57
Cool. Interesting. Well, I think that’s pretty good. I guess maybe the one other thing we should talk about in terms of what is a VM, and maybe you’ve already covered this completely, but is there anything else we need to talk about in terms of how VM accesses hardware like keyboards, and screens, and mouses, and stuff like that, or–

Chris Hickman 28:20
Yeah. Other than just–

Jon Christensen 28:21
–are we ready to move onto containers?

Chris Hickman 28:23
Yeah. I mean, I think the big thing here is that the virtual machine is virtualizing the entire servers. So as far as the VM’s concerned, it thinks it’s talking to the hardware directly, but it’s not. It’s actually going through the hypervisor. But again, for all intents and purposes, it doesn’t know that. So it includes all that support for it. Right. So it’s virtualizing the entire server with the virtual machine.

Jon Christensen 28:48
Right. So Windows running in Parallels, you have to install all the Windows drivers, everything that you would have to install if you were installing Windows on a bare machine.

Chris Hickman 29:01
Absolutely.

Jon Christensen 29:02
Cool. So are you keeping up, Rich?

Rich Staats 29:05
Yeah. Yeah. I am.

Jon Christensen 29:07
Have we made this clear? Okay. Good.

Rich Staats 29:08
I think that the big piece that I’ve taken away from it, is that when you get this machine and you decide to put an operating system on it, than that’s the operating system and you can’t have another operating system on that machine. So the only way that you can have multiple operating systems running on a single piece of hardware is through a virtual machine.

Jon Christensen 29:26
Okay. Yep. Yep. That’s exactly right. All right. So let’s go talk about what’s a container. Chris.

Chris Hickman 29:33
Right. So containers at a top level, they kind of look and feel a lot like a virtual machine. So containers, again, allow you to run multiple instances of something virtually on the same piece of physical hardware. Containers even allow you to specify things like the operating system, which makes it super confusing, right? Because we just talked about how virtual machines are virtualizing the entire server, including the operating system. Containers kind of are doing some similar things as well, so it gets kind of confusing. Probably the best way to think of this is, in virtual machines, they’re virtualizing the entire server. Containers, they’re going down a level on what they’re virtualizing. They’re actually virtualizing the operating system, right? So they’re taking that hardware virtualization out of the equation, it’s purely just software. And so that’s the big key difference between these two. So that when you hear folks say, “Oh, containers are lighter weight than VMs,” this is one of the big reasons, right? It’s basically, what they’ve decided to do, the amount of functionality that they’re doing is reduced in scope, right? It’s virtualizing purely just the operating system and not the entire server itself, and so that is one of the primary differences between the virtual machine and the container.

Chris Hickman 31:15
And by doing that, by having the virtualization done at the OS instead of the server level, there are a lot of advantages, right? So one of the problems with virtual machines if you’re kind of running multiple copies of the same application, is that it’s really– if you’re running each one of those applications in its own virtual machine, there’s a lot of duplication going on, right? Because you’re virtualizing the hardware for every single time you want to run your application, right? And there’s really no reason to do that, there’s just a lot of overhead there. And why should you have eight virtual copies of the storage system, you don’t need that. One would suffice. And that’s kind of like where containers come in, right? So containers are like, “No, really we want eight virtual copies of the of the operating system in our application, but we don’t need eight virtual copies of the disk.” So we’re going to rely on the virtualization of the hardware by something else, the common layer if you will, and only duplicate the stuff that we need, right? So it’s just an optimization of what it is that you’re virtualizing, to make sure that it fits your use case.

Jon Christensen 32:29
Right. I just thought of this kind of interesting way to say what a container is, that may hit what you just said. So maybe you can think of a container– so we all know operating systems run programs. They run Microsoft Word. They run Outlook. They run your browser. They run programs. So maybe we can think of a container as a program that’s job is to behave like it’s a whole computer. Is that close to the mark?

Chris Hickman 33:03
Very, very interesting because, I mean, that gets really close to how containers technically work. So at the end of the day, a container ends up being a program running within an operating system, and that’s how they could implement it. So I would say, yeah, absolutely. That’s a very fair and a good way of categorizing it.

Jon Christensen 33:27
Cool. And so, in that way, if we were software developers and we were writing a program– say, we were writing a browser. One of the things we would do while we were writing the browser is say, “Okay. Well, let’s make a network connection, so we can go get some web page.” And the code that we write to go make that network connection to get that web page, would rely on the operating system’s– we would have no idea how the operating system did that. We would just say, “Operating system, go do your thing to talk to network drivers. Or whatever you need to do, operating system,” so that I can write this high-level code that just says, “Get me a socket connection out to this URL, and bring me back the contents and then I’ll display them.” And that’s what we would do if we were writing a browser, and I think that that’s essentially what a container is doing too. It’s like it’s written as though all of the features of a computer are just kind of baked into this running program, so it’s depending on the operating system to do anything that it needs to do. Like talk to a network, like talk to the disk, or memory, or anything, right?

Chris Hickman 34:46
Absolutely. And at the end of the day, that’s kind of what Docker itself is doing. So we’re talking about virtual machines and containers, one implementation of containers is definitely Docker. And the containers themselves – again, they’re virtualizing the operating system, doing their thing – it’s actually Docker on the outside of those containers that’s kind of providing those kinds of services to talk to the hardware and to– and Docker itself may then be running inside a virtual machine, or it could be running on bare metal. But there’s that– again, that it’s kind of providing that layer of talking to hardware devices. So that, absolutely, when it says, “Hey, I need to go make a socket connection,” that kind of functionality is not inside the container itself, rather it’s going through the container to something above it that’s now giving it that functionality. And that’s why containers get to be so much smaller, because again the scope of what it is that they’re implementing is reduced.

Jon Christensen 36:02
Right. And like there’s a thing that I’m thinking about, and I’m nervous to bring it up because I’m afraid that it gets a little too meta. But there’s this concern that I have when I talk about this, that we’re talking about a program that’s running on an operating system and the program’s job is to behave like it’s a whole computer. I fear that that’s kind of a confusing thing to say, like how can a program behave like it’s a whole computer? How is that possible?

Chris Hickman 36:34
Yeah. And this is due to some features that were added to the operating systems themselves, years back. And it has to deal with just isolating– there are features on the operating system that allow programs to be isolated from other programs, and so call it process isolation. There’s some very specific technical features that allow that, things like namespacing. So namespaces restrict what a process can see between itself and other programs on that operating system. And there’s also things like cgroups. And cgroups are again kind of a operating system level primitive, that restrict what a particular program can do. So you end up– that’s how you can build these containers as programs running on an operating system, use some of these operating system capabilities to again just sandbox them. So you limit what is that they can see, and you can limit what they can do. And it’s basically slicing up the resources on that computer. So that as far as these programs are concerned, that’s all that exists because that’s all that was allocated to them. Or those are the– that was what they were told that they could see when they were instantiated.

Jon Christensen 38:12
Right. Right. And then just to kind of also address the thing that I said was maybe a little bit too meta, I’m just going to go for it. I’m just going to say what’s on my mind. So a computer is this thing that can take inputs from all kinds of places like keyboards, and mice, and networks, and disk, and memory, and process it. And when I say process it, I mean that it can run instructions that tell it to do stuff with that input, and then create outputs usually on the screen or out of the speakers, or whatever. That’s what a computer does, and there’s no reason that a program can’t do all same things itself. So a program can also be running on a computer, and the program can say, “Hey, let me just listen to everything as though I’m a whole computer, and let me just [inaudible] I’m getting it from wherever. Maybe they’re from the disk or they’re from some network code that I loaded up into the memory that I have access to based on my sandbox, and then let me put some outputs out there.” So the program itself can behave as though it is a whole computer. I don’t know if that was too meta, but I just– that was on my mind [laughter].

Chris Hickman 39:34
No, absolutely. And that’s why we’re sitting here talking about this stuff, is that it’s actually some pretty interesting challenging concepts. And it does get kind of meta-meta, and that’s why there’s so much confusion. It blurs between them like, “What’s the difference between an operating system, and a program, and a virtual machine?” And all these things kind of feel the same, but they do have big differences between them. So we have to come up with just different ways, and ways of trying to parse the stuff out and really understand how are they working in a way that makes sense to all of us, as opposed to just saying, “Containers’ a lighter weight. You should use them.”

Jon Christensen 40:24
Right. Right. Okay. So now we understand that they’re lighter weight because they’re just running programs inside an operating system, as opposed to the whole operating system and all of its device drivers and all of it’s everything else that it needs to go. So why is that important, what’s the benefit of that? What are we going to do with that?

Chris Hickman 40:44
Yeah. So there’s a lot of benefits, a lot of things that dis-enables from just an efficiency standpoint, and they really are analogous to what were some of the advantages of virtual machines when they first came on. So we talked about that a little bit earlier, how one of the great things that virtual machines enabled was better resource utilization, right? You can up beef your hardware, run multiple servers on it, and get better resource utilization. Containers take that a step even further, right? It’s now another magnifying factor of resource efficiency. So because you are, again, virtualizing a much smaller part of the system, you can have more of these things running on it. So we have this problem now with virtual machines, where you might have a virtual machine you allocate to your server and it ends up being that it’s only using 20% of the resources on that server. And it’s not really practical– it’s not very easy to have more than one of your servers running on that. With containers, we now have a very real easy way to say, “We can now run five of our server applications inside that one virtual machine, and now get maybe 80% or 85% of resource utilization.” And so it didn’t cost us any more money, but now we have four or five times the throughput and the ability to run our applications.

Chris Hickman 42:26
So that translates into very real savings, right? So before, I had to go purchase maybe five virtual machines. Now, I just need one. That’s a huge cost savings factor. So that’s a big, big benefit. Another big benefit is because these containers are smaller, what they’re virtualizing is much less, they instantiate much quicker. So it wasn’t very long ago that when you instantiated a new virtual machine in the cloud, like say on Amazon Web Services, it could take 10 minutes for that virtual machine to actually boot up. Because again you’re virtualizing the entire servers, so it’s got to do– load all the hardware drivers and support, and everything. It’s got to initialize everything else. So 10 minutes, and that can be a long time when you’re trying to– when you need something quickly, when you need a new application up and running. With containers, it’s a fraction of the time. You can have containers that startup within seconds. So now you’ve gone from the range of 5 or 5 to 10 minutes for startup time, to seconds. And that is a great efficiency improvement, especially when you’re– this is more like for just in software development when deploying changes to your code. So it makes it much easier to deploy, you don’t have as many problems with worrying about, “Am I going to have some downtime? How am I going to coordinate the switch-over?” Because the time window for instantiating these things is so quick, a lot of those issues goes away. So those are two of the really big benefits of containers.

Chris Hickman 44:17
Because they are smaller too, the actual software definition of these containers ends up being less just storage space, if you will. So they’re easier to move around, if you will. So a full virtual machine could be in the gigabytes of storage space. And containers, a complete copy of an operating system in your application that’s ready to go, that could be maybe 20 or 30 megabytes if you’ve optimized it. So it’s a huge factor of size decrease in these images, the software definition of these things, which makes them easier to pass around and to move around.

Jon Christensen 45:07
Right. Something that a skeptic might be– that’s listening might be thinking, that I think we want to save for another conversation is, “Well, if you can fit so many of these things on there, and that’s so great, why not just– and if each of these things is really only running one process or one application, why not just put the applications directly on the virtual machine or directly on the hardware that you’re going to use?” Isn’t that way more efficient use of all that capacity? And that’s kind of the core central thing of what doctoring containers are all about, that I think we can spend a long time with in another conversation, but I just wanted to voice that particular skeptics’ question and leave it for later.

Chris Hickman 45:55
And that’s a great point. Because at the end of the day, we talk about VMs and containers and maybe it’s kind of implying that there’s a hierarchy that containers need VMs, and that’s absolutely not the case at all.

Jon Christensen 46:06
Right, right.

Chris Hickman 46:07
You can actually [crosstalk]–

Jon Christensen 46:09
Go ahead.

Chris Hickman 46:10
I was just going to say, you can run containers on bare metal and there are really good reasons for doing that.

Jon Christensen 46:17
Cool. And so leaving that aside, I think the one last question that we can ask about containers is, can you use them in high-performance situations? We’re potentially a couple of layers away from that bare metal when we’re running inside a container. So can a high-performance situation– can you take advantage of containers, or do you kind of need– when you’re dealing with lots of processes– or, sorry, lots of instructions per second, do you need to get down to the metal?

Chris Hickman 46:50
It’s going to be totally again up to your environment, and what your requirements are, and what performance means to you. By and large, any kind of performance penalty you have of running inside of a container, ends up being probably not really noticeable for most apps. Most apps out there, especially in the web world, are what we call IO-bound. And basically, a lot of times, they just sit around waiting, right? Because they’re waiting for some response to come back over the Internet, right? So you have maybe some mobile application and it’s on a spotty 3G connection. So your server’s just sitting there waiting for this network packet to come back. So it’s just sitting and waiting. So a lot of times, it’s not really maxed-out performance-wise. Its scalability ends up becoming like– and handling multiple of these connections becomes more important. And so those kinds of capabilities, containers work just fine for you. There are cases where perhaps you want even better performance. Because I don’t know the exact numbers, but let’s call it maybe you’re paying a 20 or 25 percent performance penalty running with that abstraction, the virtualization that the containers give you. Like I said, if that ends up being too much and you want to get better performance, there’s absolute ways around that. You can still use containers, but instead of running them on top – through a VM – run them on the bare metal. And so now you’re skipping that VM layer, if you will. And so you’re reducing one of those hop, you can think of it as you’re reducing one of the hops and getting better performance.

Jon Christensen 48:47
Right. I think I actually have another thing that I can add to that, that will help. So if you’re running– if the container’s doing something that requires a lot of context switching, so if the container is actually making a lot of use of its operating system to figure out what instructions to run, then it feels like maybe there’s a bigger hit than if the container is doing a single thing. An example of that is, so within a container, for example, you can get direct access to some NVIDIA APIs. And when you call those NVIDIA APIs, you’re really no different than any other program that’s running on the operating system calling those NVIDIA APIs. And when those NVIDIA APIs get called, they get translated down through the operating systems that talk directly to the NVIDIA GPU. So really, the container talking to the NVIDIA GPU is not going through any extra hops than any other application would be. Because again, the container is just a program running on the operating system, just like a game would be. So really that kind of stuff, it seems like the container doesn’t really add any additional harm. But like I said, when the container is doing a lot of switching between stuff that it might be working on, then the container itself is running– it’s like a software operating system that’s like it’s trying to do work that is typically done at a lower level on the machine, by the operating system’s kernel itself. Did that make sense?

Chris Hickman 50:42
Yeah. Yeah. I’m just sitting here thinking, myself, the analogies and thinking it through. And it’s very fun just asking these questions and trying to understand what it is, and I think a lot of this stuff requires some additional research on our part and I think it might lead to future discussions as well. Let’s go even a little bit deeper and kind of understand how these work.

Jon Christensen 51:13
Right. So I think we can conclude here. So the hope and the conclusion would be, that based on this conversation, we’re positive that it’s useful to know the real difference between containers and virtual machines, more than just containers are a light-weight thing. And I think by diving into a couple of specifics in those differences, we’ve realized that there’s a ton of stuff that could be useful to know when you’re making a really important decision for high-performance computing or being able to scale it quickly. We never even did touch on the security aspect.

Chris Hickman 51:49
Which is a big, big consideration, and that’s something I’m kind of excited to talk about as well.

Jon Christensen 51:54
Yeah. Yeah. So, we’ll get there. We’ll get there. We’re doing this every week [laughter]. Rich, thank you for– so I just want to remind people that a big part of Rich coming along for this ride, is making sure that we don’t go too deep in the woods of technical jargon as we have this conversation. So what do you think, Rich?

Rich Staats 52:18
Yeah. So I have a couple of notes that are the way that I’m wrapping my head around this. And so if I could, I’d just like to run through them really quickly to make sure that I’m correct with these assumptions, but–

Jon Christensen 52:30
Sure.

Rich Staats 52:31
The way that I’ve sort of put this down into a narrative for myself, is it comes down to this idea of resource utilization of server or the computer itself. So, you have this computer and you put an operating system on it, but that operating system only use a percentage of resources and so it’s underutilized, right? So virtual machines allow you to extract more value out of the computer, and it’s faster and more economical to do it that way than it would be to build other machines. So the advantage of using a virtual machine over building other machines is probably cost and time.

Rich Staats 53:21
Containers allows you to extract even more value by utilizing more resources of that machine, and they do it faster, right? For instance, they don’t have to download the drivers. And so the way that I’m looking at this, is maybe my biggest challenge is that I’ve looked at these things as being so inherently different as opposed to maybe just being the evolution of it, right? So you had to have virtual machines to really understand how you can have a better product, and that better product is now what people are considering containerization. And so that’s why there’s so much overlap between the two, is that they do share a lot of the same commonalities. It’s just that containers are a way to become more efficient, lighter weight, and without the need to download everything when you need them. Is that accurate?

Jon Christensen 54:20
Yeah. Definitely accurate. So when I think about your business, Rich, and doing WordPress deployments, so you have lots and lots of clients. And each of those clients doesn’t really– we know that some of them are just very small mom-and-pop web shops, so they definitely don’t deserve their own full server. But even beyond that, they may not even deserve their own full virtual machine. Because a website that’s getting hit in the dozens of time per day, definitely doesn’t need to live on a machine that can only be split up in three, four, maybe six times if it’s a really big one and it’s six virtual machines. So a container running WordPress for them is probably sufficient. Maybe a couple of containers, so that if one goes down, the other one is still there. So that would be an absolutely perfect use of containers. So my expectation for you and your company is that you probably won’t need to deal directly with containers, that another service is going to come along and become WordPress’s hosting provider that makes use of containers under the covers. But it would be cool if you found that your particular needs and a set of clients, meant that whatever hosting servers providers came out were not quite what you needed and you ended up building your own solution with containers.

Rich Staats 55:49
Yeah. I’m just interested in– I’d probably build it anyway, just because I can’t help but do stuff like that.

Jon Christensen 55:55
Right. Very cool. All right. Well, I think we can wrap it up for today. And we’ll figure out what we’ll talk about next week, but we definitely left some open questions on the table. Thanks, Rich. Thanks, Chris.

Rich Staats 56:06
Yeah. Thank you.

Chris Hickman 56:07
Thanks, Jon, Rich.

Jon Christensen 56:08
Bye-Bye.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

01. Virtual Machines vs. Containers