April 18, 2018

06. How to Create Docker Containers (Part 1)

Show Notes
Transcription
Discussion

In Part 1 of a technical series about the creation of containers, Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache discuss how to configure a Docker file, including base images and volume mounts.

Some of the highlights of the show include:

Kelsus uses Docker in all environments through the deployment pipeline; on each developer’s machine and also in development, staging and production.
It is also possible to run Docker only in local developer environments, which can eliminate problems where builds don’t work consistently on different people’s machines. (“But it works on MY machine…”) This might be a good option if you have a mature CI/CD pipeline in place today that doesn’t use Docker.
You can configure containers differently depending on the environment they are running in: dev, staging, or production.
One of the major benefits of containers is that they are isolated. This helps with security, decoupling, optimizing your compute resources, and other factors.
At first, developers may find containers difficult to deal with because they *think* they must ssh into the container in order to see files, check logs, troubleshoot, and make code updates.
Volume mounts are a good way to solve this problem so developers can work the same efficient way they are used to. A volume mount maps the container’s file system path to the host machine file system; if those paths are the same, then developers can easily do things like check log files, code updates and hot deployments, the same way they always have.
The downside of volume mounts is that you are sharing quite a bit of the host system for ease of development. If you have code and other files that live only on the local machine that are not checked in to source control, the app may not work on any other machine.
You can mitigate the above issue by using different container configurations for local environments versus deployed environments; use shared volume mounts only in local environments. In development mode, run your Docker container with a file system mount. Before you commit changes to the source code repo, you need to run it in ‘non-promiscuous’ mode without the volume mount to verify it is working.
Docker Compose is a tool that allows you to run multiple containers as a cohesive unit. It defines how containers should be started, configured, etc. It’s like automating your command line statements to run Docker.
Volume mounts are defined outside of your Docker file – either on the command line or in your Docker Compose file, but not in the Docker file itself. Docker is completely portable, so you would never bake in a file system mount because there is no guarantee that a particular path exists on a host. It has to be specified at runtime.
Kelsus recommends that you always use a clean build machine to build your docker image, rather than a developer’s machine.
The Docker file defines what operating system and other dependencies will be bundled into your Docker image. You may create your own Docker image from scratch, or you can use someone else’s Docker image a base, and then if needed, make changes to it.
When you decide what base image to start from, and thus which software packages will be bundled into the image, it is important to consider trade-offs between security, stability, and the ease of using a pre-built image. If you’re using someone else’s image as a base, do you really know that everything included in it is secure and stable? Does it include things that you don’t need?
For situations where we have a fairly consistent tech stack, Kelsus uses a common base image that we maintain ourselves.
The more things you include in your base image, the faster you can build your container, deploy, and run tests, so maintaining a solid base image is important for CI/CD.
There is a tradeoff between fast builds and maintaining your base image, as you will periodically want to update the versions of packages you include in your container, such as npm packages, ruby gems, or other software libraries. You should expect to update and maintain your base image regularly throughout the life of your software, and developers need to ensure they are using the correct base image.
Some people prefer to always get the latest version of a library, and therefore exclude those libraries from the base image and instead pull them in when the container is built, but that can lead to instability as new versions may introduce defects.
Docker images are published to repositories such as Docker Hub and ECR (AWS Elastic Container Registry). Images are typically published along with their Docker file which defines the container configuration. Images are versioned with a tag, however it is possible for an image author to re-publish a new image with the same tag.

Links and Resources

Rich: In episode six of Mobycast, we introduce Part 1 of a technical series around the creation of containers, specifically Jon and Chris teach me about base images and volume mounts. Welcome to Mobycast, a weekly conversation about containerization, Docker, and modern software deployment. Let’s jump right in.

Jon: What we’re gonna talk about today is we’re gonna go a little deeper, a little more technical, we’re gonna talk about containers themselves and how we have been setting them up. We’re gonna talk about the Dockerfile and how we configure it, what we like to do, what features of Docker we make use of because there are about 1.3 billion features of Docker and I think we use a solid 15 and I wanna talk about at least a few of those. What makes our Docker set up work well for us?

To get started, I think the first thing that makes sense—and this is a good idea that Rich had—is instead of just jumping right in and saying, “This is our Dockerfile and this is how we set it up,” we should probably say we use containers in different ways. Maybe Chris, you can tell us a little bit about how we use containers.

Chris: Sure. Containers, you can run them anywhere, the full spectrum of your pipeline running on your local dev machine for just kind of doing the DevTest integrate cycle through automated testing on your test platforms and then of course actually using containers for deploying your services.

Companies, teams, they can use any mix of that and it just dives into the what are the features, the benefits that you’re looking for, what are the missing pieces to dictate what part you may adapt. Maybe the important takeaway here is that you don’t have to use Docker to all those phases, sometimes it might make sense where you really are looking for this feature of getting away from, “It works on my machine.” Having ways for maybe your product managers will be able to actually bring up the latest codebase themselves on their development, on their laptop, or something like that. Running Docker locally might make sense.

But you already have maybe a rich system in place for doing CI and you have a different way of doing deployment, you may not wanna do that.

Jon: Hearing you say that we run containers on our dev machines to do development and to do local testing, I think he also said that we run containers in staging or on a test machine or in production, would it be fair for me to say that any each of those different scenarios—or you could call them container use cases—is it always that our containers are configured to run exactly the same way or would we have a situation where we’ve actually built our containers differently to fit the use case that we’re running in in?

Chris: Absolutely. How you configure and orchestrate your containers will definitely be different, depending on your use case. We’ve talked about this before in previous episodes where one of the major benefits of containers is that they’re isolated, they’re basically an island, it’s a hermetically sealed environment figure application, if you will, where it does very little about the outside world and vice versa and there’s some great benefits for that for just as far as just security or decoupling and having just running many of these things on the same machine all thinking it’s only them that’s there.

One of the first things that developers run into is wow, this is like super difficult to deal with. Maybe I’m out playing logs to standard out, standard there. Before, I would just see it on my command line when I ran my code, now if I’m running it under Docker, that’s actually inside the Docker container. Now, I have to do something different. I may shell into the Docker container like I would shell into a remote machine. That gets to be, “Okay, this is different and this feels like a hurdle.”

There’s other hurdles in that processes as well especially with the hot reloading, especially if you’re doing things like—this is really a painful part of process—if you are Dockerizing a client application, you write react code or Javascript code and you wanna go change even a div tag or something like that, the first thing out of the box would be like, “Wait a minute, I have to rebuild my Docker image to see that one little change? This is really, really painful.”

Definitely the development environment, there’s ways to alleviate this pain and so you end up punching some holes into that isolated container to allow these use cases to make it more developer friendly. That said, you need to know what repercussions that may have. We can talk a bit more about that as we go through this but there are some tradeoffs for that because you’re now giving up some of the great features of Docker in exchange for better productivity for your developer.

Jon: Right, and you’re getting into exactly what I wanted to start to get into. We’re dealing with setting up our containers. We set it up in a way that Google or stack overflows that would work and we start developing and we’re hating life.

One of the first features of Docker that we use is this ability to punch holes into the container so that we can stop hating life–maybe we’d love life, maybe all of a sudden now that we have to wait a minute and a half between each div change we get to binge a show on Netflix. Really, we’ve lost our productivity.

Let’s talk about how do you punch holes into the container? How do you make sure that you only do that in Dev because it does sound sort of dangerous to do that anywhere but Dev.

Chris: Yeah, absolutely. One of the more common ways to do that is through volume mounts. What you’re doing is you’re now volume mounts in with Docker allow your container to share data with its host. Remember, all containers require a host system on which to run, on which the Docker daemon is running and spinning up your Docker container processes. You always have a host computer. If you’re running locally on a machine, your Macbook is gonna be your host computer so the local drive on your Macbook would be the host file system. When you spin up your container you can say, “Hey, I wanna have a volume mount. I’m gonna map a path in the container of file system path inside the container to map to the file system path on my host.” That means now whenever I read or write from the file system inside the container, it’s actually reading and writing from the same location on the host system path.

Volume mounts are definitely one of the core techniques here for doing this. It’s not even just for necessarily hot reloading or developer productivity, it’s also for things like just inside into what’s going on like a really great use case for this would be log files. Your application almost certainly is generating log files to indicate what’s going on in the system and diagnostics, know when warnings are happening or errors are happening. You can punch this hole in the container to say, “Hey, write these log files out to my host system so that as a developer I can tail those log files inside my command line. I don’t have to actually have to go inside the container to see and they can also persist across sessions. Volume mounts are definitely one of the core techniques here.

Jon: With volume mounts, this gets really confusing to me because now, all of a sudden it almost feels like we’re developing under a local machine again. We’re making hot deploys and all of our logs are coming into a place where we don’t have to associate to a host container to see them. I can be developing, I’m a developer, I’ve been away working away three or four hours. I’ve maybe even done three or four features during that time. How do I know when it’s time to do a build for real? At this point it’s like I’m not even really using the container anymore. It’s the container is just sort of running the code but I’m doing the code on my local machine. When does a developer know when to stop doing the hot deploy stuff and start and make a build?

Chris: It’s a great question. We can probably tag a little bit about like what’s the harm? We have this volume system mount. Let’s talk about the hot reloading portion of this. As a developer I wanna make some tweaks, have that automatically get picked up my codes run inside of a Docker container, have that be reflected instantly so that now on the screen I can see what happened without doing a Docker image rebuilt.

My volume mount is basically sharing the same directory, the same file path is where the actual source code is, so now I can edit the source code on my host machine. Because that is now shared via volume mount by container, the container, you set that up, your process running inside your […] to have hot reloading through a file system watcher so that when change has happened, the file system it just […] stay. I’m gonna recompile, […] or I’m gonna restart node or something like that, whatever the case may be. Those changes immediately happen. Even though it still runs inside the container, I now see the differences. That sounds pretty good.

The down sides to this is you’re sharing quite a bit of the host system. The surface area’s pretty broad, it’s basically your whole source code tree. The big danger here is what if you now have artifacts, pieces of code, various files that perhaps aren’t checked into your source repo, they’re only on that one local machine such that, “Hey, it works on fine on my machine.” You’re running inside this container that’s sharing this volume system mount for ease of development, you run your test, everything’s working great, you check it in and you push it to the remote repo and source code and you’re like, “Hey, job well done.” Someone else in your team does a pull. They run it and immediately it’s broken, it doesn’t work. That’s because there were specific files only in your machine that you forgot to check in. You didn’t know that because your container had access to those files outside of its process space. That’s the downside of doing this.

The way you get around this is you do a hybrid. Using Docker components, you can set up multiple different containers and one of the things you like to do is you have one container that is for your–when you’re doing this dev iteration process and you call that “my app-dev” or something like that. It’s a bit more mindful that you’re running the dev version of it and that particular definition would have the volume system mount defined.

Jon: Chris, let me stop you for a second. You just introduced something called Docker Compose. My understanding is that Docker Compose will let me create multiple Dockerfiles and each Dockerfile defines my container?

Chris: Docker Compose is a tool that allows you to run multiple containers and you can have those as a cohesive unit. You don’t have to run all of them, you can pick and choose which one do you wanna run but it’s a way of automating the command line for Docker of starting, stopping services and what not and Docker Compose is like this value add surface, it’s a wrapper on top of that stuff.

Jon: Okay.

Chris: It allows you to easily define how these containers should be started, how these should be configured, that type of thing.

Jon: That’s the part that I’m kind of confused on. We’re punching the hole in the file system or in the container by using these volume mounts. I think that that’s the things you have to do in your Dockerfile. You have to say, “Dockerfile, punch a hole for me please.” This Docker Compose, is that creating one version of the Dockerfile that uses volume mounts and another version of a Dockerfile that doesn’t or is it doing it some sort of other way?

Chris: That volume mount is actually defined outside the image. You could do it by hand. When you do a Docker run command, you would specify as a command line flag that saying like, “Here’s the volume mount I wanna have for this container.” Likewise, if you’re using Docker Compose, inside your compose file you would say, “Okay, here’s a container I’m defining, here’s its name, here’s the Docker image to use. Oh, and here’s the volume mounts.” You can have more than one volume mount too. It’s done above and beyond the image. It’s part of its run time environment.

Jon: Why did I think that you could do that you do volume mounts inside the Dockerfile? Maybe you can do them in there too if you know that that’s what you want every time.

Chris: What you’re doing inside the Docker, maybe what’s confusing is you have two different locations, the Dockerfile, when it builds a Docker image, it has to know about the host file system as well as the eventual container file system. You’re basically building, you’re defining how that container should look when it does get instantiated from the Docker image.

There’s two different file systems bases when you’re dealing with it because you’re basically saying like, “Hey, my source files are coming in from my host and so that has one path. I want to write them inside the image and that’s a different path.” In a way, you have already this volume mount if you will when you’re building Docker images, but that’s just for building the bits. Remember, Docker’s totally portable so we would never bake in a file system mount because there’s no guarantee that that path would even exist.

Jon: Yeah.

Chris: […] run that image on.

Jon: Right.

Chris: It has to be specified at run time whether you want that or not.

Jon: Okay. Cool, that’s a good explanation, thanks. I forgot where we were.

Chris: We were kind of saying like during the development mode, you would have one way of configuring your container. You would say when I’m in development mode, I’m gonna run my Docker container with this file system mount. I’m gonna iterate and test and everything. But before I actually commit my changes to my source code repo, I have to run it in that non-promiscuous mode. I have to now run my container without the volume system mount and verify that it’s working. That is also the mode in which your build machine or your CI machine is running its test. You have these other safeguards in place along the way to make sure that you are indeed building clean, pure image.

Jon: Going back to my original question of when a developer would do that, they would probably do it when they would normally do it, whenever they feel like they’re ready to shift to the rest of the team.

Chris: Exactly. When they’re ready to commit to the remote repo. Then of course, if they didn’t do that and then your CI machine runs its test, it breaks the build, then everyone else is like, “Ahh, Jon didn’t do it. Jon was lazy, he didn’t run test. He didn’t do it the right way.”

Jon: Right, right. Now we know about being able to do these volume mounts to be able to either do hot deploys or get your logs without having to shell into your running container. I think another topic that covers how do we make our containers is just where do we start with? You can start with just a plain old Unix machine with nothing on it or you can start with some containers that already have stuff on it like a database or maybe they have Ingenex or some other things. How do we decide where to start with?

Chris: Absolutely, yes. It’s another great question and kind of an area with lots of discussion around it. When you’re building a Docker image, one of the first things you have to decide is like what’s my base image, what am I starting from? Typically you need to start with just, at the very least, like what operating system, what flavor of an operating system are you starting with?

The choice of where to start kind of ends you have to take into various factors. Things like ease of use, what packages are installed and how much I have to maintain versus how much someone else’s maintaining? You can think of it as like how much you bring in, you can go with the very, very small base image which means the surface area’s much smaller which gives you perhaps much better performance, stability, and definitely security. But the cost, the trade off is like now you have to do a lot more yourself on that. You may have to do a bunch of additional steps to install the necessary software that you need, maybe you need a compiler and you now have to build that into your Docker image yourself manually. You have to find that out.

Versus you can start with an image that someone else has made, like you said maybe specifically for like Python or it may be for running […] or something like that. Then you can start with that kind of image but begin the trade off is you bring it in, you surface area’s so much more increased, there’s so much more software running inside that base image that now you’re opening yourself to more security vulnerabilities, you have the issue of am I up-to-date? What am I actually pulling in?

It also puts more onus on you to really understand like when you derive from that base image, you have to go understand what is that base image doing. There’s some responsibility there. Just as if you’re to pull in any other piece of open-source code, if you go use get some module from npm, or RubyGem or something like that. It’s your responsibility to know what’s in that code? The same issues come into play with base images.

Jon: You said something that I’m not sure I agree with because–let me just try to say what–you talked about the surface area of being smaller or bigger depending on your base image. If you need Python on there, at the end of the day, whether Python was pre-installed or whether you compile and install it on a Docker image that didn’t have it, you still end up with a container that has Python on it. What is the difference? Security wise I wouldn’t imagine there is one.

Chris: If that was the only difference, then sure, it’s not. But realize maybe the Python image that you took is based on the […] and distribution. That not only has Python support for, but it also has maybe the Java, JVM on it, and it has maybe some networking services that you really shouldn’t have installed on there. That comes into play, we have to go look to see what is it that you’re inheriting? From what base image are you using?

Jon: If you choose a base image of somebody or maybe somewhere else, it could be trustworthy but it comes with whatever it comes with, and if it’s got stuff you don’t need on it, that’s probably a no-no for production.

Chris: Yup, absolutely.

Jon: […] system.

Rich: Can I jump in with the question too?

Jon: Sure.

Chris: Sure, sure.

Jon: Go ahead, Rich.

Rich: You have a new client and it’s a new product and so you’re not inheriting a legacy product, you’re building from scratch. You’ve gone through the strategy sessions and you start to think through how you’re gonna build this. Do you guys, as Kelsus’ team, start with maybe just bullet pointing what the base image needs to be? Is that how you start that project by building a base image from which all the developers will use? Or do you evolve that image overtime as the complexity of it increases?

I know that you’re probably always going to have to evolve but I always think of the efficiency of does Kelsus just have like a base image that they use for most of their projects or at least do they start all their projects with this idea of how base image might look? Or are you always doing that alongside at the same time with development?

Chris: Yeah. That process is definitely evolving and it depends to on the project itself and whether it’s been an existing client where you have more of a track record, we know exactly what the requirements are and what the environment should look like versus a new one.

One of our bigger clients with multiple projects on it, we’ve definitely gone through this ride of having a common base image that we use across all there. We’re very much a no-JS shop on the backend, […], usually for backing stores and then towards React for our front-end clients. Pretty common text stack across that. We also have some common shared code libraries that we use across those projects that are specific for that particular client. We definitely have gone towards having a standard base image for that, that kind of includes that general setup and config for that type of environment.

Jon: Is the image one that we grab from somewhere or is it one that we’re maintaining?

Rich: Right, it’s definitely one that we’re maintaining. There’s some benefits there. It definitely can speed up your builds like identifying what parts are common that you don’t wanna keep your opinion over and over and over for every single one of the projects. If you know that you’re gonna be doing nmp install of these 10 common modules that pull in these other dependencies and every project is gonna use that, it knows that those underlined services are there, then why have all 10 projects do that over and over themselves instead having one common image that’s kind of baked into it and have those projects inherit off that so that they don’t have to do that work.

It speeds that up. It makes the Dockerfiles definitely more modular. But the cost, you now have to be maintaining the base image so you need to update that periodically, and then likewise you need to make sure that everyone that’s inheriting off that base image, that they’re picking up the new updates. If they’re locked into a specific version number or a tag, then they need to change that. There’s some additional maintenance there but a lot of benefits with it.

Jon: Sorry, Rich, I just wanted to take a shot of a rule of thumb because to me it feels that the biggest trade off is maintaining your base image and all the things that might be on it versus waiting for containers to build. Because you can always build images with whatever you want on them.

When I say waiting for containers to build, the benefit is you’re always up-to-date. During that process, you can say just always grab the latest of everything so that you never fall behind. That seems to be the trade off and it feels to me like a good rule of thumb is if the stuff is on the container is sort of heavy to build, time consuming stuff, so I’m talking about things like ImageMagick which takes forever or Nginx takes forever to build. Those are multi-minute build. I think ImageMagick may take like 10 minutes on a full […] computer, I don’t know how much it would take on a stripped down container. If you’re sitting around waiting for 25 minutes for containers to get created, that’s gonna start to create bottlenecks in your CICD pipeline.

Chris: Absolutely.

Jon: That tradeoff has to be just right where as much as you can get built on the fly without waiting forever, it seems like you should, just have a good built by the container, then you know you’re always up-to-date, you have less service area to maintain. But anything that’s sort of hard, that takes a long time, probably should build it in, probably should just bite the bullet and know that every once in a while, I gotta go check and see if there’s a new ImageMagick or new whatever else it is that you’re maintaining on there.

Chris: Paradoxically, there’s actually some advantages sometimes to not being completely up-to-date.

Jon: Oh, sure, sure. Yeah.

Chris: This is actually one of the arguments for having a common base image where it is locked in, this is especially true like in the land of node where npm, you have npm modules that you’re getting from the open source community, those are changing all the time. People are supposed to follow semver but they don’t have to and some of those authors don’t even understand semver. You may find that by saying, “Hey, I want the latest minor version, it actually breaks my code.”

Jon: Right, right.

Chris: You didn’t do anything. No code was checked in at all and no changes are made but just by virtue of doing a new build, it’s now broken and it doesn’t work. Sometimes, it’s gonna lock in yourself in with the base image on certain things like that, there’s some advantages to it.

Jon: Yeah. I just really wanna drive out that point home because I had a startup at one point and it was a Ruby on Rails startup and gems just change all the time. It just constantly influxed and we had a developer’s kind of a developer who sort of insisted on always being up-to-date because he never wanted to deal with the pain of getting too far behind and the amount of churn that that could cause, but this is one of those startups where we were working in our spare time and eventually I was just like, “Dude, you have to stop doing this because everytime you set down the code, you spend your first hour and a half fixing all the broken stuff that came from the gem updates, stop. Let’s let them sit for a while.

Rich, I’m sorry, we cut you off a little bit ago. Is your question still there?

Rich: Yeah, it’s more of like a practicality question. I imagine, much like get your repo, the remote repo that houses these base images they could be forked to. You could have a base image that is really as a starting point for projects and then you can fork it and then create a specific for that project, is that true?

Chris: I’m sorry, so you’re saying as far as starting with someone else’s base image, fork in it, and that becomes yours, was that your question?

Rich: Yeah. I mean is it similar to the way that your open source code in Git would live where if I like it, I can just fork it. That’s the starting point.

Chris: Absolutely in that, what you can do–Docker images, you refer to specific Docker images basically by their repo name as well as a tag. There are ways to say, “Hey, you can reuse the same tag and like a common convention, it’s latest.” You can say, “I’m gonna go get my base image coned latest. Then that way whenever I build, theoretically I’m getting whatever the current version of that Docker image is, conversely you can lock it into a very specific tag. You can say, “I only want 2.6.2.”

If you lock it into certain tag like that, your base image, then you can absolutely think of that as a fork. If you say, I’m gonna create my own base image just based on this other third party image, I’m gonna lock it into a certain version number, to a certain tag. At that point, I have effectively forked it, unless they re-publish under that tag which they could do, but in general the expectation would be that they wouldn’t.

Jon: Chris, it sounds like there’s kind of three things if you do that. There’s the original base image, its tag, then there’s gonna be some code that you have in GitHub that says, “Here’s how to make the image that I want from my original image because it’s gonna install some stuff on to that original image.” Then there’s the artifact that it creates that then gets put back into Docker hub or ECR, Elastic Container Registry, as your new tagged base image.

Chris: Yeah. You can actually further protect yourself from what I was just talking about there were if the original base image, if they re-publish with the same tag, almost all of these Docker images that you would use as base images, you can see the actual Dockerfile that was used to create that. You could fork that, you could go and fork the actual Dockerfile for that image, and then build it yourself.

Rich: Now it’s my code. I could change the Dockerfile. But at least has, I don’t know, how ever many lines of code already done for me and now I can add to it or remove from based on the project.

Chris: Yeah, yeah. Almost all is definitely the all the open source, the ones that are in public repos, these Docker images are in public repos, they will publish the Dockerfile and you can build it from the actual Dockerfile instead of grabbing it from the repo that they publish to.

Jon: Right on. I think that just given how technical today’s conversation has been so far, we’ve talked about volume mounts, we’ve talked about base images, I think that may be enough. I think that my head is already spinning with all this stuff. We had thought about talking about, “Okay now we have containers and we may need to talk to one another and how do you deal with that.” But that sounds like maybe something we should talk about next week.

Chris: Oh, man, Jon, just getting started, we got so much more to talk about. I can keep you guys here for hours. No, seriously, I agree, I think that’s a lot to cover today, it definitely was a bit of a deep dive, I think. Plenty of stuff to talk about in future episodes.

Jon: Great. Thank you for joining us today, it’s been a fun experience.

Rich: Great, thanks guys.

Rich: Well, dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/06. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you, and we’ll see you again next week.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

06. How to Create Docker Containers (Part 1)