January 22, 2020

96. The Future of Containers – Part 2 – Making Sense of MicroVMs (continued)

Show Notes
Transcription
Discussion

Summary

Maybe you’ve heard some of the buzzwords everyone seems to be talking about when discussing the future of containers. Strange words like “microVMs”… “unikernels”… “sandboxes”.

Have you wondered what these things are and how you can use them? Or, for that matter, should you use them?

In this episode of Mobycast, Jon and Chris continue their three-part series on the future of containers. We go deep on the most talked about microVM – AWS Firecracker. We learn how Amazon uses Firecracker and its tremendous benefits. We then discuss how to use Firecracker for your own containers and get the same great results.

Companion Blog Post

The Future Of Containers – What’s Next

Show Details

In this episode, we cover the following topics:

We revisit a misunderstanding from last week’s show to find out exactly what the Firecracker team means when they list “Single VM per Firecracker process” as a security benefit.
We discuss what’s next on the Firecracker product roadmap, with particular emphasis on support for snapshot/restore.
We learn how AWS uses Firecracker in production today with AWS Lambda.
AWS is currently working on updating Fargate to use Firecracker. We look at why they are doing this and the design details of updating Fargate to use Firecracker.
We finish by looking at how you can use Firecracker for your own containers, by incorporating Firecracker-aware tooling into your container infrastructure. Specifically, we look at firecracker-containerd and Weave Ignite.

End Song

Thing Is by Public Address

More Info

We’d love to hear from you! You can reach us at:

Web: https://mobycast.fm
Voicemail: 844-818-0993
Email: ask@mobycast.fm
Twitter: https://twitter.com/hashtag/mobycast
Reddit: https://reddit.com/r/mobycast

Stevie Rose: Maybe you’ve heard some of the buzzwords everyone seems to be talking about when discussing the future of containers. Strange words like microVMs, unikernels, sandboxes. Have you wondered what these things are and how you can use them? Or for that matter, should you use them?
In this episode of Mobycast, Jon and Chris continue their three part series on the future of containers. We go deep on the most talked about microVM AWS Firecracker. We learn how Amazon uses Firecracker, and its tremendous benefits. We then discuss how to use Firecracker for your own containers, and get the same great results.
Welcome to Mobycast, a show about the techniques and technologies used by the best cloud native software teams. Each week, your hosts, Jon Christensen and Chris Hickman. Pick a software concept, and dive deep to figure it out.

Jon Christensen: Welcome Chris. It’s another episode of Mobycast.

Chris Hickman: Hey, Jon. It’s good to be back

Jon Christensen: Good to have you back. So here we are. Today we’re in the middle of a series that is one of the hottest series ever on Mobycast. I’m so excited about this series. We’ve been talking about microVMs, and nobody knows that much about them, except for the people that are making them, and a few people that aren’t listening right now.

Chris Hickman: Those five people in the world.

Jon Christensen: Right. I guess we should get into a recap, but before we get into the recap, I guess a lot of what we’re going to talk about today is going to be specifically about Firecracker. We’re going to get into some more details. And where we left off last week, we’ll do the recap, but I want to kick off the recap with this idea that I had, that microVMs, and how they work, and how they’re orchestrated, seems to me, from what I’ve learned so far, to be inspired by how container orchestration and container run times and things like that work. So I was trying to draw some parallels between the two, in my mind. Maybe we can keep that in mind, or maybe you can just totally disagree or flip it on its head for me, Chris. But in either case, maybe we can start with a recap.

Chris Hickman: Sure. Yeah. So let’s first start with recap. So last episode, we kicked it off. We start off with, hey, the current landscape is, we’ve talked about virtual machines, we know about that. It’s full virtualization, strong isolation, really great security story, but heavyweight. And then we have containers that came along, and containers are virtualization at the OS level. They’re much faster, much more resource efficient, and not only that, they’re virtualizing and abstracting at the application level. So it comes to us like this perfect natural unit of extraction for us to do our cloud native apps. Versus virtual machines, you’re virtualizing the entire server, right? Just packaging ends up becoming a lot more difficult, if it’s a VM versus a container. And so we want to deploy apps, not servers.

Jon Christensen: Yes.

Chris Hickman: So containers has that rich ecosystem, that really supports us on that. But now, the issue that we’re faced with is, by getting that great abstraction at the application level, and we have this great speed and performance, and we have the resource efficiency. We gave up some of that security benefits, because it doesn’t have that strong isolation. So now we’re looking back and saying, “Hey, what if we could have that isolation that we got with the VMs, but still keep this model of all the good stuff that we get with containers?” And so, that’s where we’re going here in the future. That’s what there’s a number of different techniques and camps and projects that are going on and addressing that.
And so we introduced the concept of, okay, there’s, there’s microVMs in this space. We also have something called unikernels. And then another flavor would be sandboxes for containers. So we’re on that process now of let’s go describe these things. And we started off with microVMs, explaining what those are, and now we’re going through the treatment of two of the most popular microVMs, or the most active ones, being Firecracker from AWS, and then Kata Containers, which we’ll be talking about a little bit.

Jon Christensen: That was such a funny little catch, Chris. It’s like you wanted there to be room out there for there to be a more popular one that’s less active.

Chris Hickman: Yeah. And the other thing is too, this is all pretty new. Especially this concept of microVM space. There’s not a lot of contenders out there. And this is one of the things I hope we really get across during these episodes as well, is just, what does this mean to you? Do you have to do anything? How do you take advantage of this? Is there anything you need to do? What is that information that you need to know about? So, we’ll try to help break that down as we go through this.

Jon Christensen: Spoiler alert, its probably not that. Since it came out, you need to be using it in production. That’s probably not the answer we’re going to get.

Chris Hickman: Yeah. We’ll go through it, and we’ll see what it is. But yes, chances are there’s probably not a lot that you’re going to have to do, other than be aware of some of these options. And so, that actually is a good segue into what you started at the top by saying, “Hey, I’m thinking of this as, there’s a correlation between runC and”-

Jon Christensen: Just a quick reminder that runC is a way of creating a container, and running it. And then the other process that we talked about with containers, was containerd and that was a way of looking across containers, being able to see what’s running, and how they’re running, and things like that. So, containerd is a daemon process, a service that manages containers, and runC is a thing that creates an instance of a container.

Chris Hickman: Right, absolutely. So, that defines the spectrum of the runtime support for containers. So containerd is that high level run time environment, and runC is the low level run time. So runC is using the SIS calls to go and create containers and destroy them. And then the high level run time containerd, and this is all in the Docker space, there’s other runtime systems out there and whatnot. But as far as containerd and runC goes, that containerd is that high level one that’s now managing more the abstraction level, it’s dealing with things like containers, and snapshots, and images and whatnot, so that it’s much easier to work with. As opposed to, what’s the SIS call for? Actually directly manipulating C groups, and namespaces and whatnot.
So in this space now, with microVMs, you’re creating VMs first, and then the container couldn’t run inside that. And so, what we’re seeing with microVMs is, they need a way to integrate into this, into this ecosystem. So we talked about first there was VMs, then there was containers, and containers had been around now for a long time, very rich ecosystem with the entire tooling set from start to finish, from when you start developing an app all the way to deployment, and observability, and monitoring, and maintenance of that app. And so here’s something new that comes along, it can’t be completely different and disruptive, otherwise you’re throwing away all of that ecosystem, that tool set.
So, that’s what these projects are looking at. How do they integrate into that? So that, as far as users are concerned, it’s no different. So I think that’s the primary motivation that’s driving these integration points is, they actually didn’t really have much of a choice, the microVM implementers. That was the natural point of abstraction, of integrate into the system, and saying, “Okay, we’re now at that level, that low level run time.” But as far as everything above it, it’s creating a container. But instead we’re intercepting that, and first we’re going to create a VM, and then we’ll create the container.

Jon Christensen: Ah, okay. So the piece that… I was going along being confused, because I was like, I’m not talking about any integration here, I’m just talking about starting, stopping, and managing microVMs feels like a very similar problem to me, then starting, stopping, and managing containers. So if I had never done this with microVMs before, I might be like, “Well, let’s see how we do it with containers. Oh look at that. There’s a process that starts and stops them. And then there’s this other process that manages them. That seems like a good way to do it.”
And so that was my point. Not that the Firecracker process was the same as runC, or doing the same exact thing as runC, or had any integration with Docker or containers whatsoever, but just that it feels like the way the whole system is set up is inspired by the way containers are started, stopped, and managed. And then later, potentially, if there’s a quote unquote Coobernetti’s of Firecracker, I would be totally unsurprised.
But what you’re saying is, a lot of the actual job, especially around the stuff in AWS, is to actually start a container when you’re starting a microVM, and run that container within the microVM. And in order to do that, you actually have to integrate with the container stuff. And so there is a point where the two meet, microVMs and container stuff. That’s also interesting. I have a lot to learn there. I’m not sure. I don’t even know if we’ll cover much of that in our episodes, but I don’t think that’s the only thing that Firecracker is for. Right? It’s not just for running containers, it could potentially run other things, right?

Chris Hickman: It could, but this is really the natural evolution, and this is, I think, the primary use case for it.

Jon Christensen: Okay.

Chris Hickman: Think about it. We have VM systems out there that are very robust. Full support, and all the features, and just everything that you need with VMs is already out there. So why would you need a microVM then? It’s like, well, microVMs become necessary when you want to create just thousands of these. And they’re short lived, or you just need them to be really, really lightweight. They don’t need much in the way of emulation. And they need high performance, and then they go away. It’s like that single process model, almost. Thinking about it.

Jon Christensen: It’s like a single container per microVM sort of thing?

Chris Hickman: Yeah, exactly. That is the primary use driving this. And that’s one of the reasons why I like… MicroVMs probably they wouldn’t have happened if it wasn’t for containers. Now that’s the problem that it solved. There’s really nothing wrong necessarily with VMs being full featured and heavyweight, if you’re not creating lots of them, creating a whole bunch of them and then tearing them down and whatnot. So it’s really driven by this use case of containers.

Jon Christensen: So by having Firecracker be totally tuned to run a container, I almost imagine it in the sense of biology, it’s like a receptor that’s totally tuned to accept this protein. You know what I mean? It’s like the shape of a container has a thing that it needs, and Firecracker is it. I’m going to make it.

Chris Hickman: Yeah, absolutely. AWS, they tell us this right there on the homepage for Firecracker, it is purpose built. And they even go as far as to say, for serverless applications. This was built for Lambda, no ifs, ands or buts. This was built for Lambda. So it was built for running functions securely in an isolated way, at massive scale.

Jon Christensen: Well, maybe that’s where my confusion was, because Lambda, when you talk about Lambda, you don’t talk about containers, you talk about zip files, you talk about layers, all kinds of stuff. But you don’t talk about containers. And if Firecracker is built to run a container, then the secret is, guess what? Your Lambda function is actually running in a container, by the way.

Chris Hickman: Maybe taking this step… So as far as the actual implementation goes, does Lambda package your function as a container run that? Probably not. Literally just doing the files, it’s just the bare bones file. But again, it’s this space of, it’s really just a single process. And whether it’s inside… Whether it’s packaged as a container, or if it’s not packaged as a container, it doesn’t matter anymore, because it’s now isolated inside that VM. So think of it more along the lines of, just what’s the minimal amount of features that you need from a VMM to run this very simple thing? It ends up being very, very similar for running a function, like a Lambda function, versus running a container as well. Again, that purpose built for, think of it as like, it’s just a single thing running inside this microVM.

Jon Christensen: I don’t know, Chris. You’re doing a little dancing there. That was a little dance, right? Because we were like, “Firecracker is just absolutely the perfect shape to run containers.” And then it’s like, “But actually it also has the perfect shape to run Lambda and that might not be in a container.” So there might be two. What does that mean? Does that mean there’s a special Lambda Firecracker, and another open source Firecracker?

Chris Hickman: The difference between a container and a function, they’re just different processes, but they’re all just still just one process. And they both need controls over-

Jon Christensen: No, I’m not going to let you get away that… Yes, it’s a process, but it’s got special things about having C groups, and what file system it has access to, what memory it has access here. Its very, very, very specific process, and that’s what I’m talking about when I talk about the shape of a container. I was starting to go down this rosy path of thinking, oh yeah, and Firecracker’s the perfect shape to give that shape a hug. But then when I think about the shape of just some process that runs a function, that could have a much smaller, simpler shape than a container. I can write that function in three lines of code and run it, and that the process that’s running in, is not going to look anything like a container process. See what I mean? See why I’m like, “Come on man, I don’t know about this.”

Chris Hickman: Yeah. Again, there’s nothing specific to Firecracker for that locking into the shape for a protein so that these two things lock up. This is a VMM that is just doing the absolute bare minimum to run a server process, that doesn’t have a keyboard, doesn’t have a video display or anything else, none of those bells or whistles. And we have requirements, but we need it to be able to spin up really, really fast. We want it to be as resource efficient as possible. We want to be able to tune it, and say exactly how much memory gets, and how much CPU it gets.

Jon Christensen: Right.

Chris Hickman: And so those are the primary characteristics that they have when they go build it. Because that’s exactly what Lambda is. But they just didn’t go… So when they built it, they just said, “We want a VM, in order to run that function.”
So that’s what they built, is something, a VM that can be spun up very, very quickly. That can be totally controlled on resources, from both the CPU and memory standpoint. It was really geared to running server processes and to be very high performance. But it’s still just the VM, that’s running a guest OS. So then with containers, and now looking at what can we do for things like Fargate or whatever else comes along? It’s just you have this guest OS that’s tuned for high performance server applications that don’t need rich emulation of a bunch of different devices. So running the container run time inside that guest OS, you can absolutely do that, and why not?

Jon Christensen: Okay. And then I guess the last piece to help me pull this all together, I’m going to try to pull two threads together to do this. One is, I was reading a Reddit thread the other day and this person was talking about Firecracker, and said that they were reading the documentation in AWS. And that they started getting confused because the Firecracker documentation started veering off into Docker land, and talking about a whole bunch of Docker stuff and they were like, “Why is Firecracker documentation talking about containerd and runC? I don’t understand.”
And then at the beginning of this whole conversation, this complete side journey we’re going on here, we also were talking about containerd and runC, and I was trying to draw a very loose parallels between the Firecracker ecosystem and the Docker ecosystem, but you actually pulled them together and talked about integration. And then that actually confused me, and caused me to think that there’s always integration, because that’s all Firecracker can do, is run Docker containers. And then I got corrected on that.
But I guess my question is, why is the AWS documentation talking about Docker? Why did you start off talking about the integration possibilities with Docker? Is it just because it’s a good fit? Or is there something more specifically custom built for Docker into Firecracker?

Chris Hickman: All right. So let me try to frame this a bit. So Firecracker in and of itself, it’s just another VM. So how are you going to use that? So we’re all running containers today. How are you going to use that? And so, think about it. If you said, “Oh, Firecracker sounds great. MicroVM, it’s super performant and it’s got all the security benefits of the normal VM, and I want to start using it.” What would that mean? Something like ECS, or Coobernetti’s or Swarm, or just plain old Docker, how do you do that?
You would have to manually yourself set up. You’d have to run some command, a spawn of VM, and then if you wanted, you can now install Docker or something inside that VM and now run all your containers inside that VM, but you didn’t really get much. Because now you’re running all your containers inside a single microVM. So now if you want to have this tooling around it to say, well I really want to create a microVM for every one of my containers. Well, what’s going to do that? You’d have to manually do all that stuff, so it would be a lot of work, a lot of heavy lifting that is really going to, everyone’s going to have to do this. It doesn’t make a lot of sense. So that’s why-

Jon Christensen: Or if you didn’t want to run in container, let’s just think of it from their other perspective. Let’s say you’re like, “I’ve got my microVM right now, and it’s got low overhead, and it’s got good isolation from the other microVM. So I’m not going to take on the overhead of Docker. I’m just going to run my Ingenix process directly in Firecracker.”

Chris Hickman: So you could do it. But that’d be such a big step back, because that’s the reason why we went to containers to begin with, is it gives us that repeatable agile process of being able to abstract at the application level, and it gives us all the packaging, the formatting. Just think about all the tooling that goes around with that. We’d be giving that up and we’d be going back to, here’s the jar file, and here I got to go install Tomcat, and I’ve got to go install this patch. It would just be going backwards so much.

Jon Christensen: Unless you could create an image of your microVM. Just do that, and then throw those things around the internet.

Chris Hickman: But then you’d have to do that for every single one of your applications. Or every-

Jon Christensen: You do for Docker. So, if you could do that with Firecracker instead of having to do it with Docker? I’m just saying Firecracker is using Docker, because by using Docker, it’s able to get away with not having to do a whole bunch of stuff that Docker already takes care of.

Chris Hickman: So actually, let me just point this. So Firecracker actually has nothing to do with Docker, in and of itself.

Jon Christensen: Oh sure.

Chris Hickman: Firecracker team is building integrations.

Jon Christensen: Yes.

Chris Hickman: To integrate in with Docker, so that other people, other than AWS, can use Firecracker.

Jon Christensen: Right, right.

Chris Hickman: Like ADA. So AWS, when they’re running Lambda, again, they’re not using Docker with this. They don’t need Firecracker containerd, which is one of their integrations. And that’s what the Reddit user was talking about there, they’re looking on the Firecracker side and they saw that there’s this thing called Firecracker dash containerd. What are they talking about in Docker, and OCI and whatnot. What are they talking about? And so that is the additional piece that they’re building, to build that bridge between Firecracker and containers. Because Firecracker in and of itself, there’s nothing container specific about Firecracker. It’s just a virtual machine.

Jon Christensen: Right. Okay.

Chris Hickman: It’s just a hypervisor.

Jon Christensen: Fine, fine. I just have to say this thing also though. But as you can imagine, as you work on efficiency and performance and scale, there is still, between Firecracker and you’re running application, there’s this container, and it has a lot of benefits around packaging and portability and things like that, that you could pull into Firecracker someday, to make Firecracker even more efficient. You could basically make it so that Firecracker was the place where that imaging, and portability, and same shapeness all took place, so that you could create Firecracker images. And you’d have a Firecracker image repository and you could put Firecracker into your code pipeline, you would remove just that one little piece of abstraction that we don’t technically need, it’s just there because it’s a lot less work for the Firecracker team to do when they can depend on Docker to do that for you, instead of having to build it themselves. But it isn’t for free. That’s the point I’m trying to make. That container, spinning it up and running it, is another couple of microseconds that they could potentially get rid of.
That’s fine. Now I’m done. I’ve made my point. But it’s important though, that point is important, because that’s what AWS and the whole cloud infrastructure ecosystem is always doing, where can we look at some piece of software that isn’t actually accomplishing anything for us anymore, and get rid of it. And then also, where is there software that we can turn into Silicon? Those two things is what they’re doing as they add more and more efficiencies. And in this case, it looks like there’s maybe a little tiny bit of software that is owned by the Docker that could go away someday.

Chris Hickman: Yeah. There’s always going to be improvements to do to Firecracker on the roadmap, especially with the various specialized use cases. The one thing that I will firmly believe in, it was, at the end of the day, us the users, the developers of applications, we don’t need to know about Firecracker, we want to work with containers, so we want our Docker files. This is how we build containers. If we have to now go learn how to do a Firecracker build file format, which is maybe a whole different way of building images, and I need a whole new tool set, tool chain, for building those images and running those locally on my local machine and whatnot, it just becomes this whole brand new thing, where now I have to make the trade off. Is it worth the investment versus if I can just-

Jon Christensen: Dude, AWS just announced that, and I’ve already started putting it in production.

Chris Hickman: See you in 12 months.

Jon Christensen: What do you mean we don’t want new stuff, and we don’t want to change every week?

Chris Hickman: That’s a whole other episode we can talk about. It’s just there’s so much change going on, and how do you know when to actually pull new stuff in, and kill your current best practices, to revamp them and incorporate the new, versus, if you consumed everything that was new as it comes out, you’ll just be spinning your wheels constantly and you’re never going to get anything done. So there is some pragmatic middle ground. And again, I think the good thing here is that, for us, the container ecosystem is so rich. We’re so efficient with it, and we’re so productive with it, that doesn’t need to change. Let these other things adapt to that, and we can reap the performance there. That’s the undifferentiated heavy lifting.

Jon Christensen: I like thinking of it as a protocol that we like, and then we want to stick with.
So 20 minutes of off outline conversation, but it was really, really helpful for me. I think I feel so much better about how this all works and fits together, and what it’s for, already, with Firecracker, than I did last week, but I’m sure there’s some details on Firecracker that are in the outline that we want to get to you and make sure get into your head, Mr. Listener, and Mrs. Listener, and other folks.

Chris Hickman: Sure. Okay. So we can start the episode now, is what you’re saying.

Jon Christensen: Yeah, let’s get started.

Chris Hickman: Perfect.

Jon Christensen: We cover a lot of information here on Mobycast, and if you’ve ever wanted to go back and remind yourself of something we talked about in a previous episode, it can be hard to search through our website and transcripts to find exactly what you’re looking for. Well now it’s a lot easier. All you have to do is go to mobycast.FM/show-notes, and sign up. We’ll send you our weekly, super detailed, outline that we use to actually record the show. And a lot of times this outline contains more information than we get to during our hour on the air. So signup, and get weekly Mobycast cheat sheets to all of our episodes delivered right to your inbox.

Chris Hickman: Okay. So let’s pick up where we were last time. So last time we were going through Firecracker. We had discussed, what is it? And then also what are the benefits of it? So we talked about, hey, you got really strong security performance and efficiency benefits. If you want to go back and listen to that episode to get the details on it. But just, there are some very strong benefits there with security, performance and efficiency. So given that, let’s talk a little bit about, what’s the status of that project, and where’s it going?
So they did announce it. They released Firecracker at re:Invent 2018, so it’s been out now for a little over a year. They started using it in prod in 2018, so it’s relatively new. They have a roadmap, there’s lots of new things to add to it. Some of the big things that they’re working on right now, that are enhancements in progress, are support for other platforms. So specifically, Arm and AMD. So currently, it’s supported on Intel x86, and so they’re looking at expanding that out to be on other processor architectures. Another thing that they’re actively working on, is that integration in with containers, and specifically, with containerd. And we’re going to talk about that a little bit more in detail in a little bit. And then another big thing they’re working on is snapshot, and restore functionality. So all virtual machine technologies have the ability to snapshot, and to restore.

Jon Christensen: Yep.

Chris Hickman: So they-

Jon Christensen: I wonder what you would do with us those images if you had them?

Chris Hickman: Excuse me?

Jon Christensen: Those are images, right? That’s what I was just talking about.

Chris Hickman: Yeah, absolutely. So it’s a virtual machine image for sure. So you could absolutely come up with this… People do this. You have built systems in place now for people that are not on containers, that are actually just using virtual machines. And they go and their build process creates virtual machine images, and those gets passed around and gets instantiated. So absolutely the same thing will be possible with Firecracker. So for folks that want to put in that tooling, and do the work to do that, that fits their model, once snapshot and restore is there in Firecracker, then they would be able to do that. Lots of benefits too with snapshot of restores. So we talked about last time for a VM, a microVM, to spin up with the guest OS looking at 125 milliseconds, pretty fast. But if you do it from a snapshot, they’re talking five milliseconds.

Jon Christensen: Wow.

Chris Hickman: Right. So just almost instantaneous, that you would have an up and running VM if you’re instantiating it from a snapshot.

Jon Christensen: Wow.

Chris Hickman: So, again, if you have just a handful of core applications that you really want to tune for, and you have just thousands upon thousands of these instances that you want to have, then something like this is probably going to be really, really attractive to you.

Jon Christensen: Yeah. That’s amazing.

Chris Hickman: So their snapshots are minimal, and it’s basically just taken snapshot of the virtual machine state. So things like KVM internals, and registry values and whatnot. It’s snapshotting devices. So the user configuration and the virt IO internals. And then also it’s doing a snapshot of the memory. That’s one of the really interesting things here, when it takes a snapshot of the memory that produces base, it’s basically a memory map file.
And so the great thing about it… Think about it. When we spin up virtual machines, or containers, we say, “This is how much memory we want,” and it’s a fixed reservation. So we say, “Oh, this is going to get two gigs,” or something like that. But maybe your application only uses a hundred megs, but it’s still going to use up the full two gigs, it’s going to reserve that. Well with these snapshots, and because it’s memory map files, if you were only using 100 megs when you snapped it, that’s all it’s going to use when it starts back up, when it restores. So it’s a very great ideal way of compacting your memory usage, and resource usage. So, pretty interesting. So that’s something that they’re working on. It’s not yet available, but it will be. I would imagine sometime this year.

Jon Christensen: That’s really interesting. My mind just going off in that whole thought of saving an application in its exact state, and sending it somewhere, and then it reads something from that state, pretty cool. Pretty cool stuff you could do.

Chris Hickman: Yeah, absolutely. What we’re talking about this stuff now is that this is all on the horizon. It’s new, there’s a lot of work going on here. Today, there’s probably not a lot of stuff that you really want to do with it. But going forward a year, two years down the road, who knows? We might see some dramatically different workflows with the support of other tooling, and other projects, and whatnot, to take advantage of some of these things. And we may see a really different way of applications being deployed, and hosted, and how they run out in the cloud, versus today. But still keeping our rich ecosystem, so that’s where the Firecracker project is going.
So why don’t we talk about, now that we know exactly what it is, and all these great things about it, let’s talk about, how do you use it? So there’s two things I want to talk about here. One is, so this was built by AWS. So let’s talk about how AWS is using it, because that’s one way that you do use it, is via proxy from using the AWS services that are using it. And then we’ll talk into, what if you wanted to use it directly yourself?
So as far as AWS goes, we’ve already talked about it quite a bit. That Lambda was definitely the first use case for this, and the first customer. And so Lambda has been using this since 2018, and we talked about, they are now doing trillions of function calls a month running Firecracker with Lambda. So very much proven, this is a real thing. In the past, we’ve speculated a Fargate is using Firecracker. I’ve actually seen a bunch of references, saying that yes, Firecracker is being used by Fargate. Well it turns out, no it’s not. They’re still working on that. It’s very much actively being developed. I think they’re really close to being to the point where they’re ready to go to prod with Fagate using Firecracker. And when they do, I think we’ll know, even if they don’t tell us, we’ll know it, because there’s going to be a price drop, for sure, with Fargate.

Jon Christensen: And then also just everyone might be like, “Whoa, look at this. Look at how fast my tasks are starting and stopping.”

Chris Hickman: Yes, yes. Absolutely.

Jon Christensen: Wouldn’t that be cool? If you’ve got to spin up six containers, six tasks, and that’s not that many, but you’re just like, “Whoa. And now they’re up.” You press go, and it’s like, poof, they’re up. That’s not how it works today. So it’d be neat if that is how it started working.

Chris Hickman: Yeah. That’s an interesting point, is that right now, the way it works is, if you’re spinning up containers, if you’re using an EC2 launch type, the VMs already exist. So there’s not that overhead, that exists. You’re just creating a container inside of a VM that already exists. With Fargate, they’re typically, they’re always over-provisioned, for just exactly this case. So the odds of you, when you’re spinning up a Fargate container of them needing to spin up a brand new VM for you, is very, very low. Because that’s a bad experience. Think about how long it takes to instantiate an EC2, to bring it up. You’re talking about-

Jon Christensen: Long enough for getting a snack. Not really, but you know.

Chris Hickman: Yeah. It used to be six or eight minutes. Now it feels like it’s more like 90 seconds, or two minutes, but still, if it took that long for your process started, your container to start, you wouldn’t be too happy. So they’re really over-provisioned there. So we get to reap the performance benefits of their overprovisioning, but it’s costing them a lot more in the way of resources, and to do that. And Firecracker, they’re going to reap just tremendous benefits from not having to do that as much.

Jon Christensen: And honestly, I don’t know if it’s purposely put in Jitter, because they talk about how fast things are, hasn’t your experience with ECS, with both Fargate and EC2, been that, if you’re swapping out a set of four tacks, or you’ve got four running containers, and you’re going to kill them all and put new ones in, that’s not very instantaneous, that it takes some time. You can watch it happen, and you’re like, “Okay, now it’s done.” And it feels like with the efficiency gains that we’re talking about, that could be nearly instantaneous. So I clicked the button, and, “Okay yeah, it’s done.”

Chris Hickman: Yeah. And so, Firecracker is not going to help with any of that, that you see right now.

Jon Christensen: You don’t think so?

Chris Hickman: No, absolutely. Because all that time-

Jon Christensen: Because of the over-provisioning?

Chris Hickman: No. All that time and delay is actually coming from the orchestration. One of the things that you have is just draining from ELBs. By default, it’s a five minute drain period from ELBs for your task. So it basically stops routing traffic to that task, that instance, for as soon as you want to take it out. But it waits five minutes before killing it. So, just by default, it’s going to take you five minutes to deploy a new version of your container on ECS, if you don’t change that. And that’s to deal with the count. What if you have long running transactions? So there’s a lot of complexity that’s just built in between the orchestration, integration with other services. And so, like I said, the overhead of actually spinning up a VM, that performance benefit here, we’re not going to see much difference from a practical standpoint.

Jon Christensen: That’s really unfortunate, because it’s where I want a lot of performance increases. Talking to developers, there was a time when things were simpler, where you make a change on your code, refresh your browser, and you see your change. And loop is very, very short. And there are ways you can still make that happen, but there’s some trade offs around that happening, because you’re not really working in a real system when you do that. You’re not working in a distributed system. So you want to be able to test your code and in the actual cloud, in its real environment.
And it just seems… Just my little rant is, it’d be so cool if I could click a button in my IDE, or hit enter at the end of a CLI command that basically said, replace what’s running in my ECS services with this new thing, and by the time I refresh my browser, it’s already done, and I’m already seeing the new thing. It’s real, and everything is still behind load balancers. Everything is still connected to real databases. Everything is still running on real Fargate machines with microVMs, but it’s all fairly instant.
I think that, if a cloud can provide our King get closer to that experience, everyone’s going to be stoked. Making the whole provisioning of new stuff faster, so that we can work with real stuff in real environments and change it, and see instant results, instead of minutes long results, would be so cool. That’s what I want. Do that.

Chris Hickman: I don’t know if you saw, many of the presentations at re:Invent had the triangle slide, the pyramid slide, showing their priorities. And top priority is always security, then comes availability, and then performance. So that might be a little bit of a hint of why they’ve made some of the tradeoffs, and why things work like this, and why we talk about things like blue, green deployments and canary deployments.

Jon Christensen: Right, right. And I guess while you wait for your cloud permission to run, is a nice time to have a lightsaber battle.

Chris Hickman: Yeah. The old meme, or joke, about, “Oh I got to compile, so go get coffee.” So there’s always got to be something where it’s like, “Oh got to go get coffee.”

Jon Christensen: Confirmation today.

Chris Hickman: Yeah. All right, so moving on. So, that’s Lambada. So they are actively working on Fargate to use Firecracker, and this is one of those just really key important things for them to do. And the reason why is because with Fargate, again, we talked about that pyramid, and we said security for AWS is always top priority. What they’re doing now with Firecracker, is they, for security reasons, they will only run a single task in a VM. They will spin up a new VM for every task. That is really expensive, and that is a lot of potentially wasted resources.
Think about if we did that, that means that we would have an EC2 instance for every single one of our task that we wanted to run, which gives up one of the great benefits of going to containers is, we can actually have that compaction. Where we can say, we don’t need to have that. We can have 10 of our containers run on a single EC2, but we’re in an environment where we’re allowed to do that. We can be multi tenant, because it’s all of our code. But with something like Fargate, there’s not that guarantee, it is basically multi tenant, where all AWS customers are using it.
They knew from the very get go, this is what we have to do. We have to have a dedicated VM for every task. So there’s a big overhead there. And so if they have tens of millions of Fargate task launches, that means that they have tens of millions of EC2 instance launches. So with Firecracker, once they have that in place, now they can run each one of those tasks inside a Firecracker VM. And those Firecracker VMs can now run on bare metal instances. And so, now you can run multiple tasks on the same instance securely. Now they can the same tens of millions of Fargate tasks launches. Now there’s going to be tens of millions of Firecracker VM launches, but those are going to be much, much faster. And now they can have a much smaller set of bare metal instance launches to support that. So they’re going to get much, much better resource utilization there, and just scalability that they get. So that’s definitely one of the things that they’re looking to do with this.
We’ve talked a lot about security, because it’s a virtual machine, there is that strong hardware isolation boundary through the hardware virtualization. And so that allows them to do this, to run on bare metal and have multiple tasks running on the same bare metal machine. There’s some architecture details as well, about how they get to implement this. And with Fargate, you have a customer ENI for your task. And so when we talked about this, back in the Fargate mini series about setting up the task networking, and AWS VPC mode, and you’re basically spinning up an ENI inside your VPC. That’s the customer, ENI. So there’s that. And then there’s also a Fargate ENI. So there’s a Fargate agent. So, just like we have ECS agents for the EC2 launch type, there’s the same concept in Fargate. There’s some agent on that machine that’s talking to the Fargate control plane. And in the current environment, there needs to be a Fargate ENI per each tab, a Fargate agent for each task, and that ends up being lots of overhead. Versus now, with Firecracker, they’re going to be able to have a Fargate agent per bare metal instance.

Jon Christensen: Interesting.

Chris Hickman: Right? So, there’s some pretty interesting architecture diagrams out there for both of these situations, where it shows how it actually is stacked together and works. But I think we can keep it at the level of the saying, Firecracker is really allowing for not just this strong security and isolation, but it’s also very real in the way of really reducing the amount of resources they have. So they don’t need nearly as many ENIs dedicated for Fargate anymore. And they don’t have to have all this agent COEs, they don’t take up a lot necessarily, but it all adds up, when you’re talking about tens of millions of launches. So if it’s 10 megabytes or whatever it is, multiply that by tens of millions, and that’s something that’s actually a big number.

Jon Christensen: Right. And talk about… Just the business opportunity for them with this is so incredible, because, say they are able to run all the tasks on half as many machines as they needed before. So they’re saving half, and then they could pass on a quarter of that savings to their customers, and all the customers would be like, “Sweet!” And then the customers are happy, and they’re making more money than they were before. What a cash cow that’ll be.

Chris Hickman: Absolutely. Yeah. This is a very strategic, very important, project for them. Talk about ROI.

Jon Christensen: That’s what I’m talking about.

Chris Hickman: This is massive, just massive ROI. And this is so much, for just going forward into the future as well. This is going to pay off in spades. And the interesting thing too, I bet you the teams are not that big, so you talk about how many two pizza teams are involved with this? I would think we’d be surprised at just how few it is, versus how much or leverage it’s giving AWS, what that return is. So pretty cool.

Jon Christensen: Agreed.

Chris Hickman: Yeah. And then obviously, lots of operational efficiency gains they’re going to get in Fargate through Firecracker. We’ve touched on them. So you can now tune this to be the exact CPU memory needed with rightsizing. They no longer have to… They can have a homogenous fleet of bare metal instances. Now they have to have different instant sizes, because they’re doing it at the EC2 level, and so the different EC2 families have different amounts of CPU and memory. So you can’t run a task that requires to two gigs of memory on a T2, it needs to be something bigger. So they have to have this mixed fleet of instance types right now in order to support. And so they have complicated code to figure out where does it get scheduled, and run. Versus with Firecracker, they can do that at the microVM level, and they can have just these bare metal instances and they can come back-

Jon Christensen: Just big ones. Just go with big old servers.

Chris Hickman: Cool. So that’s how AWS is using Firecracker, and obviously getting great gains out of it. So the real question then is, how do we use it? What does this mean to us? Other than if we’re using Fargate or Lambda, how can we take advantage of Firecracker?

Jon Christensen: We can make a multi-cloud function runner, or container runner, and have that be our new company.

Chris Hickman: There you go. Yeah, you absolutely could. And who knows? I wouldn’t be surprised if there’re projects out there that are doing… I know there’s a bunch of functions as a service, open source things and whatnot, so who knows?
We already talked about this a little bit, Firecracker in and of itself is just a VM. It’s a VMM, virtual machine monitor. So you can use it as a VM, so you can use their tooling to create VMs, and tell them what you want your guest OS to be, and you can tell it how much memory you use, and CPU to use. Like I said, there’s still, if you wanted to integrate that in with your application set, then it becomes a lot of work now, to do that orchestration. So it’s just a lot of work to be able to just consume Firecracker in and of itself.

Jon Christensen: Sure.

Chris Hickman: So there’re projects out there that are working, again, to integrate into this container ecosystem, because that’s what we want. We want our containers, remember, we had our chant last time? What do we want? Performance and efficiency. How do we want it? Secure and isolated. And so that’s what these projects are doing. So the Firecracker team, they’re working on the containerd integration. So they are working on Firecracker version of containerd. It’s a replacement for the standard containerd daemon. It’s actually custom compiled to have one of the control modules that’s needed for Firecracker. So there’s a bit of work there for whoever does want to use this. You have to go in and have this custom compiled version of containerd just for Firecracker, but it’s compliant with it. And you can now use either runC or Firecracker as your low level run time for containerd, so if you use the Firecracker dash containerd, then having a Firecracker as a low level runtime, is now an option in lieu of runC.

Jon Christensen: Ah, interesting. So, that is the thing that I was alluding to earlier. What if we had Firecracker images in a Firecracker image repository? So that’s what that’s for?

Chris Hickman: Kind of.

Jon Christensen: Moving in that direction.

Chris Hickman: You’re still dealing with container images. It’s just that the low level runtime that normally is dealing with just containers, it’s been updated to be smart enough to say, “Before I create my container, I first need to create a VM in which to place it.”

Jon Christensen: Okay. Okay, nevermind.

Chris Hickman: It’s doing some value added work there. But everything above it is just saying, “Create a container,” so that’s how it plugs into this tooling, so that you, as a user of it, you don’t have to really know anything about Firecracker. You’re just containers, and all of your tooling above that level, all it sees is containers.

Jon Christensen: So you would use that, if you are running on prem, and you were making your own little platform as a service, and you realized, that you had been doing it with Coobernetti’s before, and using regular old Docker containers and you’re like, “Actually this is dangerous, because our platform as a service has different applications running on it that really we should be keeping more isolated from each other. Let’s pull in Firecracker to be our run time for the containers.” You can use Firecracker containerd, and now all of a sudden, all your little containers are running inside Firecracker VMs, and are more isolated from one another. Yes?

Chris Hickman: Yes. That’s definitely one of the big use cases that we’re using microVMs for, is for that strong isolation. So if folks are-

Jon Christensen: And I’d like to go to that platform as a service example immediately, because everyone gets it. Oh, I can imagine, Capital One has their own platform as a service for their own internal apps. It just makes sense that they might have that.

Chris Hickman: Yeah, absolutely. So that’s containerd for Firecracker, and that gives us the integration in with our normal Docker tooling, if you will. There are some other projects out there doing the same thing. So there’s something called Ignite from Weave, and basically this is the same thing, where it’s now allowing you to run Docker and OCI images within the Firecracker VM. So it’s just a different approach. Basically it’s hooking into that same place. And then they’ve also gone a step further, and they have something they built called Fire Cube. And what Fire Cube is, is now it’s hooking into Coobernetti’s, and it’s container runtime interface, CRI, to now work with Firecracker.

Jon Christensen: As Coobernetti’s is deploying pods and running them, it’s actually making Firecracker VMs with containers in them.

Chris Hickman: Yep. And so it truly is, probably at this point, the easiest way. If you wanted to consume a Firecracker yourself, for your own self hosted container run times, then definitely the most mature option now is if you are running Coobernetti’s. So using something like Fire Cube and Ignite, it’s there, it’s real, it’s live, and it plugs right in and use it. Versus on the Docker side, I guess what swarm the… There’s no no equivalent thing for swarm, and then I guess ECS, we’ve known what’s going on there. So that’s Fargate. So I guess that does cover most of it from a orchestration landscape.

Jon Christensen: Yep. Super cool. I don’t see myself cracking up in Firecracker to use the word cracker twice, anytime soon. But it’s really fascinating. I think the thing that I get out of this, that I love so much, is just that sense of directionality in everything that’s happening in the cloud. When you see, and understand containers, and then you see, and understand container orchestration, and also before that VMs, and then now microVMs, it’s like, “Oh yeah, this is all moving in a particular direction of more isolation, more efficiency and better tooling.” Everything is just refining those three things.

Chris Hickman: Absolutely. This is all just evolution, it’s not revolutionary. This is not stuff really… That’s just… It’s not even hard. You can totally connect the dots. It makes so much sense that this is where we have gone. And the interesting thing, again, it’s all being driven by cloud. It’s just all being driven by that. So to me that’s interesting. It’s used to be PCs was driving the innovation, and now, I don’t know what the numbers are, but I don’t think people buy desktop computers hardly at all anymore. Smart phones have replaced that. And other kinds of connected devices on the personal side. And then on the server side, it’s all cloud.

Jon Christensen: Yeah. Well awesome. I know we have more microVMs to talk about. So maybe we’ll talk about the rest of it next week?

Chris Hickman: Yeah, I think we covered a lot today, and we spent a lot of time in Firecracker, but it is one of the front runners here, and really important to just go through and talk about that to really understand what microVMs are.
I think next time we’ll pick up, we’ll talk a bit about Kata containers and, spoiler alert, Kata containers, in looking at this, it was a little bit confusing to me. Because they claim that they’re a lightweight virtual machine, but it turns out, I think you can think of them more as, this is another type of runC replacement. Because Kata containers actually require some other VMM, some other hypervisor. But this is one of the other popular projects out there. We’ll talk about that.
And then I think that’ll be the time where we can really dive into unikernels as well, because unikernels really align well with some of the same goals that microVMs have, and use cases, but they do it in a completely different way.

Jon Christensen: Cool. Can’t wait.

Chris Hickman: All right. Thanks, Jon.

Jon Christensen: All right. Yeah, thanks, talk to you next week.

Chris Hickman: See ya.

Jon Christensen: Bye.

Stevie Rose: Thanks for being a board with us on this week’s episode of Mobycast. Also, thanks to our producer, Roy England. And I’m our announcer, Stevie Rose. Come talk to us on Mobycast.fm, or on Reddit, at r/mobycast.

Summary

Companion Blog Post

Show Details

Links

End Song

More Info