Jon Christensen and Chris Hickman of Kelsus and Rich Staats of Secret Stache conclude their series on Bret Fisher’s DockerCon 2019 session titled, Node.js Rocks in Docker for Dev and Ops.
Some of the highlights of the show include:
- Breaking Docker News: Steve Singh steps down as CEO; former Hortonworks CEO Rob Bearden to take over
- 1-2 Punch Prediction: VMWare will overpay to acquire Docker; win-win for both
- Episode 59 recap of Node Dockerfile best practices
- Node Process Management: Rely on Docker and orchestrator for code errors
- Aspects of healthy shutdown and graceful stop for Node application in a container
- 3 Node Shutdown Options: Modify Docker start command using Tini; build Tini into Docker image; update Node app for Linux signals
- Connection Tracking: Use Stoppable to wrap Node.js server for shutdown/close function
- Security Scanning and Auditing: Identify vulnerabilities to know what code to change
- Leverage healthchecks using Docker Compose
- Production Checklist:
- CMD node directly
- Build with .dockerignore
- Capture SIGTERM, properly shutdown
- npm ci or npm i –only=production
- Scan/audit/test during builds
- Healthchecks (readiness/liveness)
Links and Resources
Rich: In episode 60 of Mobycast, we conclude our series on Bret Fisher’s DockerCon session, Node.js Rocks in Docker for Dev and Ops. Welcome to Mobycast, a weekly conversation about cloud-native development, AWS, and building distributed systems. Let’s jump right in.
Jon: Welcome, Chris. It’s another episode of Mobycast.
Chris: Hey, Jon. Welcome. It’s good to be back.
Jon: It’s good to have you back. We’re doing part two today of the talk that you listened to at DockerCon that was one of your favorites from Bret Fisher and that Node.js Rocks in Docker for Dev and Ops. This is the part two of that.
Last week, we talked about just some Node Dockerfile best practices. Maybe you could just give us a quick recap on what we covered there and then we’ll jump into the rest of the talk. There’s a lot more good stuff.
Chris: Absolutely. First, I was hoping we do our, “What you’ve been up to?”
Jon: Oh, absolutely.
Chris: Just because I have something I definitely just want to talk about.
Chris: It’s interesting news. Just recently, Docker issued a press release saying, “Steve Singh is stepping down as CEO.”
Jon: We’re just talking before the podcast started and I didn’t know you have this secret.
Chris: Yeah. Keep it fresh and spontaneous. Not terribly shocking news but pretty interesting for a whole bunch of reasons. This is a week after DockerCon is when they issued a press release. It’s like, “Hmm, probably would have been nice to do this before DockerCon,” then, help make that introduction, if you will, and have the transition. According to Docker, the deal wasn’t fully in place. That’s the reason why they didn’t do that.
Jon: It didn’t start with the huge security. They had three days before DockerCon.
Chris: Yeah. All these stuff factors into it, like what’s really happening behind the scenes. The official word from them is they’ve been talking about this for months and it wasn’t finalized until after DockerCon. But who knows? The person they’ve tapped to come in in DCO is Rob Bearden who is former CEO of Hortonworks. Also, a few other open source companies like SpringSource and JBoss.
Jon: Wow. Somebody who speaks the language of an enterprise for sure.
Chris: Yes and from an open source standpoint. It’s interesting there. Again, I’m just going to reiterate this. I’m just going to lay down the prediction. Docker is going to be acquired by VMware. VMware is going to overpay for them, but they’re going to do it. This could be a one-two punch for VMs to containers with now […], the Kubernetes founders and then Docker.
I think it’s actually going to be a win-win for both of the companies. I think Docker is definitely in a spot about here, really struggling to find its footing and what that business model is. Acquisition is really the only exit for them. They keep talking about like, “Oh, yeah. We can go IPO and we’re going to be cash flow positive by the end of the year.” I just don’t see that.
Jon: I’d love to disagree and have a little bit conflict on the show but that sounds right to me.
Chris: Yeah. We’ll see. I would not be surprised if within six months, there’s an acquisition.
Jon: I can’t remember the name of the CEO you said, but it would be interesting to see what kind of ties he has to VMware.
Chris: That’s interesting that you bring that up. He was COO at SpringSource and that was acquired by VMware for $420 million in 2009. Hortonworks, apparently, he was there when they went public and they were acquired by Cloudera last year. He’s definitely got the experience and the chops. He got the connections side of things. Again, VMware, I think this is the next acquisition.
Jon: It does makes sense. It would make VMWare the source of VM. That’s where you go for operating systems that are not operating systems.
Chris: Yeah. Just the whole enterprise space, the hybrid cloud space, it actually makes sense. I think from an ecosystem, like this acquisition, makes sense for them and I think they will overpay for it. From just a pure revenue business standpoint, they will make something of it, hopefully. The value they place on it is not going to be upon Docker standalone value. It’s going to be that synergistic value that someone in VMware can extract from it and build on it.
Jon: Right. Very cool. Now, we can go in to our recap of part one.
Chris: In part one of this, we got through basically talking about just Dockerfile best practices for Node.js applications. We talked about base images, that’s what you’re starting with, what are some guidelines there for how to go build the best Dockerfile. We talked about Node modules and making sure that’s not included in your image. We talked about especially with native code being compiled sometimes. In Node modules, it’s really important that you’re building that for your target platform correctly. And then we finished up talking about least privilege principles and specifically taking advantage of the built-in node user that comes with the official Node.js Docker images, that are there but they’re not enabled by default. You have to do some work there to actually switch over to use that. That’s what we’ve covered in the first part.
Here in the second part, we’ve got a lot more to cover. I think Bret’s talk was 40 minutes and that was probably the first seven. This is traditional Docker. Every DockerCon I’ve been to, it feels like these sessions are just like drinking from the fire hose. They’re always 40 minutes long but it always feels like, given the amount of material that they’ve given you, is at least 60 minutes if not 90 minutes worth of material.
Jon: Right. Going into the next thing that he talked about, it looks like he talked about Node process management. I’m really curious about this because recently, with the news of the container breakout stuff and at Kelsus Camp a couple of months ago, we went a little deeper. That wasn’t Mobycast, but it was just with our own company. We did some work to learn more about how Unix processes work. They’ve been on my mind recently. I’m curious what do you have to say about process management for containers.
Chris: There’s several subsections to this and all dealing in that space. We’ll be able to dive into that a little bit. It is interesting to just really understand what’s going on here. At the end of the day, it’s processes all the way down.
With that, as far as process management goes for your actual containers, you don’t really need anything extra there. Rely on your orchestrator to do that for you. Rely on Docker, rely on your orchestrator.
With Node.js in particular, it has had so many tools out there for dealing with process management. Things like Forever, Modules, PM2, Nodemon, there’s the Cluster module. There’s always been process management in that space.
Jon: For those that are listening that might not be Node people, I think it’s because Node doesn’t have a management process that keeps things running when there’s an error. If you do make a mistake on your code, the code just shuts down. It just terminates the process.
Chris: Absolutely. There is no built-in process management per se with Node. If you have […] exception, there goes your process. Your server’s done.
Jon: I think people are used to working with applications servers, whether its Java ones, or Python ones, or Ruby ones, are used to that application server being really resilient to errors happening in the components of its code.
Chris: Yeah. The youthfulness of Node versus something like Java or .NET, too. These tools spring up and then you seem more and more just coalescing with maturity at the platform and whatnot.
There’s also these tools that’s been out there for things like doing support and hot reloading of code. You’re developing on your machine, you’re running it, you’re testing it, then you go and changes line of code. You have process manager that is doing things like watching the file system […] changes and when he sees that it crashes the process and restarts it, he picks up those changes that you just made.
His point here was just like, “You don’t need to use this stuff in production on the server. Instead, just let your orchestrator and Docker handle this for you.” We have things like health checks. You can spin up as many of these tasks that you need, let your orchestrator do that, with the caveat that use something like nodemon when you’re developing locally. That makes sense because that’s going to give you things like hot reloading.
Another part of this was just really calling out that, using npm start, that’s an anti-pattern for your Node apps. Do not use NPM to start your app instead you should be starting it directly via the Node process. We’ll get into it a little bit more why that’s the case.
Those are the two main ideas that pushing across there in that particular section of the talk. From there, it went into like, “Let’s talk about shutdown now. What does it mean for a healthy shutdown in a Node application?”
Jon: A Node application in a container, right?
Chris: It is for a Node application in a container. Some of these applies outside the containers as well. Specifically, running inside of a container, it’s running as PID 1. It’s the first process in the system or the container, the initialization. This is typically what’s happening. You’ve dockerized your Node app. You’re spinning up a container based upon that image. This is the first thing that’s running. It’s running as PID 1. It’s the init process.
Just some background information there. The init process in containers has basically two jobs. One is to reap zombie processes. Zombie processes are subprocess that have lost their parent process. It’s also responsible for passing signals to the subprocesses. In general, with Node and Node apps, the zombie processes should be much of an issue. You’re really just running your Node apps unless it’s just spawning a bunch of other processes. It’s just probably not going to be too much of an issue. The signaling is important especially with […] this shutdown comes in the play.
For proper shutdown Docker is using Linux signals to control the apps. This is things like SIGINT, SIGTERM, and then SIGKILL, the force quit, if you will. SIGINT and SIGTERM, these Linux signals are what allows that graceful stop to your application. When you do a Docker stop on a particular Docker process, this is what it’s doing. It’s sending the SIGTERM to that container and then it waits. By default, going to wait 10 seconds for that container to respond to that and shut down. If it doesn’t, then it will kill it. It’ll do that force quit on it.
In order to have a graceful healthy shutdown, you need a container that’s going to respond to these Linux signals. This is where it comes in with NPM. With NPM, it’s not responding to SIGINT and SIGTERM. If you’re using npm start to launch your app, you have a problem here. It’s not going to get processed at all.
Chris: That’s why npm start is the anti-pattern. The recommendation is to not do it.
Jon: If I could just make sure I understand, if you do npm start, then your Node app is running within a process that’s owned by NPM. Then, when you send signals to that same process saying, “Okay, we’re done. Stop,” NPM is like, “I’m busy. I don’t hear anything.”
Chris: Yeah. In that case, NPM is PID 1.
Jon: Yeah. Your actual Node code is maybe a child of that. Okay, got it. Interesting.
Chris: The other thing to keep in mind, Node by default is not responding to SIGINT and SIGTERM but you can with code. You specifically have to add code in there to handle it. This gives rise to what can you do. He outline three possible workaround and solutions here. The first off is a temporary or a band-aid fix. Docker as a version 1.13 has a –init flag. What that does is it wraps your process with a lightweight init system. Basically, it’s called Tini, which is a module. It’s designed to run as PID 1 and to do the right things,
When you’re running your container, docker start, if you use the –init, that’s going to basically use Tini as PID 1 and now you will be responding to the SIGINT and the SIGTERM signals. The problem with that is that you may not have control over this. We run it in AWS but we’re using ECS. We’re not the ones making the docker start and stop commands. It’s actually the ECS Agent.
I haven’t look into what integration the ECS Agent has with Docker and what that then allows to actually specify this kind of argument in your task definition file.
Jon: I bet it does.
Chris: It probably does, but that’s what you could do. Alternatively, if you can’t modify your docker start command then another workaround would be to simply build Tini into your image and have Tini be PID 1.
Jon: You’ll have the last command of your Dockerfile via Tini command?
Chris: That’s pretty straightforward and easy to do. You can fix this PID 1 issue. Probably the best way to do this, definitely the right way to do this is just update your Node app to make sure that it properly captures the Linux signals. Make sure it has handlers for SIGINT, SIGTERM, and SIGKILL. Actually, SIGINT and SIGTERM.
Chris: Because you […] some of those and you don’t need to kill, most likely. In addition to that, just don’t use NPM to start your app. Just call Node directly. Have Node be your PID 1, invoke that to start your app up, and then just process these signals correctly.
Jon: I could be wrong here, but if somebody’s listening they’re like, “I really want to contribute an open source update. I want to contribute something than Node.” This could be an area where you could do something. It feels like there’s some general stuff that pretty much any Node app would benefit from if you just put it into Node itself. I mean, it’s like calling to be worked on. This is like a thing that every single person that ever build a Node app should do this extra code. Why not one person do it at the root of the tree instead of one doing it on their leads?
Chris: Yeah. The good news here is that Node already has the event handling in it. It’s actually really easy to do this and it actually is boilerplate code to add in these event handlers for these Linux signals. What isn’t boilerplate is that for every app, shutdown is something different. The bodies of those event handlers is really up to you. You don’t want someone else doing it. You have to decide like, “I’m shutting down. What cleanup do I need to do?”
Jon: That I agree. It’s so true. Never knew that as a general case.
Chris: Yeah. There’s not a lot of work here to do it. As a standard practice, people don’t think about it. You do have the things like after 10 seconds, Docker’s just going to kill it. Instead of shutting down in two or three seconds, it shuts down in 10 seconds. It’s one of those things that maybe people just don’t scratch their head and ask, “Why is it taking 10 seconds to shut down every time? Why doesn’t Control-C work in the console?”
Jon: Right. Where we go next from here?
Chris: It’s part of this theme of just better shutdown, healthier shutdown. Talked a bit about connection tracking. Basically, you should track your HTTP connections and send them FIN packets when you’re shutting down. This is mostly for people that have connections. When you’re handling one of these signals, you’re going to be calling server close to shutdown your server, by default, Node.js is not going to close keepalive connections when it happens. They’re just going to be just abruptly terminated. Instead, what you should be doing is you should send FIN packets so that these things close in a graceful way. You’ll also want to make sure that you’re stop accepting new connections, existing ones get closed, and whatnot.
There actually is an open source NPM module out there. It’s called stoppable and this is something you can wrap Node.js server object with. What that does is it provides for this really graceful connection handling. It stops when you call it. It stops accepting the new connections. It closes the existing idle connections without killing request that are currently in flight. It allows for this really graceful shutting down, draining of those connections, and shut down.
Jon: I feel like it’s maybe just the Node ethos of super, super minimal and very, very lightweight, making no assumptions about what people might be using it for that has led to this. I come from more of a, “Man, things are a lot easier when you have some opinions in your software.” It bugs me that this is even a thing. It also bugs me that stoppable is a separate module that you need to add. I would just rather have the peace of mind that this is not something that I need to think about. The fact that you have to think about this is almost like showing off. It’s like, “Guess what I thought about, everybody?”
Chris: FIN packets.
Jon: Yeah. I don’t know. Anyway.
Chris: From day one, the ethos Node was it’s going to be very much like Unix system software. Do one thing really, really well and then you can have a community of other tools that build up around it, that built up that ecosystem. I think now, perhaps they wouldn’t want to be so hardcore in that except now, they have things like backward compatibility. If they go and change how this works, then what do they break? They’ll probably break a lot and they’re going to get a lot of flak for that and whatnot. Some of these things, you made the decisions, eight, nine, ten years ago, we just have to live with them now.
Jon: Right. Boy, does it feel to me like a decision that wasn’t actually made but like, “Oh, hey. Guess what we don’t do? We just didn’t get that. We just released. We released early and often,” and that was one of the things that we didn’t think about when we released.
What comes next after everyone remembers to use stoppable for their HTTP connections, particularly their keepalive ones, what comes next?
Chris: There was quite a bit of a discussion in Bret’s talk about multi-stage Docker files and how to use those to basically break out your process into different stages like production versus dev versus test and whatnot. That’s a whole big topic. We’ve touched a bit on this in previous episodes of Mobycast and maybe we’ll do a future one as well, but we’re going to skip over that one for this recap just because it’s such a big topic. We can talk about that for quite some time and we want to keep this at a reasonable length today.
After talking about the multi-stage Docker files, talked a bit about security scanning and auditing. This is one of those things where it’s like a no-brainer. It should just be table stakes. It’s so easy to do. You can do it for free. You can use a paid service, but just get the auditing and scanning as part of your CI process.
NPM has the audit and that’s going through and checking against known security vulnerabilities. You can also do full CVE scanning with tools like MicroScanner from Aqua. It’s really easy just build it into your Docker image or part of your CI process. Just do it. Make sure you’re doing CVE scanning and this is definitely something like on our team we’re going to be working towards and making sure we’re doing this.
Jon: Yeah. I think a great way of finding yourself not doing it is if you ignore one of those NPM audit messages three times, then you’ll ignore it forever. Don’t let yourself do that three times in a row just like any kind of code compiler warnings. I’m sure everybody’s has been on a project where the first time you compile it, you’re like, “How are we living with 5000 compiler warnings? How did this happened?” It’s the same kind of thing and it’s more serious when there’s security audit even then when there are just compiler warnings of code that should probably be written differently.
Chris: Yeah and even just knowing, just seeing the results of it. If you run these scanners, you’re going to see quite a few things. Some of those are going to be things you can change and some of those are going to be stuff you can’t do anything about because it’s coming from dependencies. But just having the knowledge of knowing what is the service are here, what’s going on, or the code that we’re writing in particular, and the dependencies we have control over. Just knowing that, just having that information, and then you can make that decision online, how you strict you want to be. Do you really want a failed build or do you want to continue on? Just knowing or having that knowledge is what is […] here.
Jon: It’s wild how fast those audits get updated. I’ve been writing some Node code myself for the past few months and just been blown away. It cleared out all my security audits which always requires a little bit of work, a little bit updating, and then two, three, or four days later it’s like, “Woah, there’s another one.” It works. They really can track.
Chris: Absolutely. It’s done at the CVE level where it’s coming from across all different software packages and whatnot. They’re also doing more security evaluations of the actual modules themselves inside NPM which is where they’re actually doing security audits and flagging issues.
Jon: And then it’s important, too, in places like Babel when you’re doing transpiling and you’re actually letting something touch every piece of your code. It’s so critical to make sure there’s not malware or something like that.
Chris: Yeah or an NPM module that you’re installing doesn’t got and grab passwords or grab creds that are in memory and then forwards it along to some proxy or something.
Jon: That was not the point to single out Babel. It’s just the point like it’s something that has access to everything. Where do we go from security scanning?
Chris: There was quite a bit of discussion just about Docker Compose and having that as part of your workflow, especially how it integrates in with things like Docker health checks. But kind of me just to point calling out myth busting that Docker Compose YAML, it’s version. It would be V1, then V2, and now it’s V3.
One of the myths is that V3 does not replaced V2. V2’s focus is basically on single Node development test versus V3 came out really for the multi-Node orchestration. It’s really for tools like Swarm and Kubernetes. It was the additions needed in order to enable those for things for deployments and managing clusters and whatnot.
This is just a high-level point, just realized that if you’re on V2 of Docker Compose YAML, that’s okay. You’re not missing much. You don’t have to go feel like you’ve got to upgrade to V3. So, something to keep in mind.
Jon: I have some feelings about that but I’m just going to let them go. I’ll just stick with V2 and not think too hard about it.
Chris: There you go. I’ll talk a bit about just Node modules, specifically how you mount this. Sort of volume mounts for […] mounts. Again, we’ve talked about this in the past, poking the hole between the container and the host, what you’re sharing, and finding that right mix between the isolation that containers promise versus the utility, the flexibility for developers to do things like hot reloading or what not.
They’re just various techniques that you can do with no bother to make sure you’re not getting into a situation where you’re using Node modules that were compiled for one target but you’re running on another one. This again, is one of those bigger topics we could talk quite a bit on, so let’s just leave it at that.
The one little tip here that was useful is you’re on a Mac and you are using volume mounts, just use the dedicated decoration inside your Docker Compose file, so when you specify the volume inside Docker Compose, if you just use whatever name you want on it and then call and delegate it, you’re going to get a performance increase. Go and do a search on it. Go look up some more on it. As a protip, you can get some better performance there if you do that delegated right mode on your volume ounce.
Jon: That’s interesting. I think that’s perfect because it dovetails with an episode we did before about not letting things be a mystery. We don’t really have time today to go into exactly how that works and why it gives you that performance increase, but don’t do that without understanding that. Go read about it and then do it.
Chris: Crawl, walk, run.
Chris: The final section on Bret’s talk was about health checks, specifically, how you can leverage Docker health checks via your Docker Compose. Again, we did a whole episode on health checks and […]. We can probably do a whole nother episode just on Docker health checks, compose files, how you set those up, and how you can specify dependencies and conditions based upon what the health checks are, and what the status of the health checks are for the various different services that are in your Docker Compose file.
The point here was just definitely be aware of this. Definitely consider the leverage in it and using it. It’s part of the infrastructure that you’re using with Docker Compose and you should take a close look at it.
Jon: Very cool. I think we can finish up with something I’ve been eyeing this whole time. It’s a checklist and I love a good checklist. What do we have?
Chris: In summary, here’s your production checklist for running Node.js under Docker. One, just make sure you’re commanding Node directly. This, we talked about. Don’t use npm start. Instead, just have Node be your PID 1. As we said, make sure you’re handling, capturing SIGTERM and properly shutting down.
Also, make sure when you’re building, make sure you have a Docker ignore file and definitely make sure things like Node modules is included in there, you’re .git directory’s in there, log files, any other artifacts. Just make sure that whatever’s on your build machine, you’re building images that you’re excluding the stuff that shouldn’t be in that image with the Docker ignore file.
Another part of the checklist is make sure you’re using NPM CI or you can also do NPM-I using the only production command flag. You want the minimal set of code of artifacts when you’re doing a NPM stall. Use those options.
Jon: I’m just curious about that. I’m not as familiar with NPM as I should be. NPM CI or NPM-I only production, I would guess, knows to only put the parts that production needs into production. I’m guessing that whoever’s writing the library that you’re installing with NPM needs to be aware of that and know the difference between what they should put into a production build versus a development build. If they weren’t aware, then there probably is no difference. […] feature, right? Like, did you build support for this into your library or not? I guess that’s my question.
Chris: Yeah. I think this really just applies that you package JSON files. In package JSON, you can have your dependencies and then you have your deb dependencies. Production is just going to install the dependencies, not the deb dependencies.
Jon: So it’s not down to a library level?
Chris: No. Don’t think of it like, “Oh, on back-end C land and it’s now an optimization flag on my compiler to do things like unroll loops,” or anything like that. It’s not that.
Jon: Okay, thank you.
Chris: Yeah. Another bullet would be just make sure you’re scanning, auditing, and testing your builds. We talked about that CVE scanning and using things like NPM audit. Really leverage what’s there that you can take advantage of in your CICD pipeline. Then, health checks. Again, for readiness and liveness. Look at Docker and the infrastructure supports a very robust ecosystem of health checks there between specifying conditions, how they’re used in your Docker Compose file, and Docker itself, making those health checks through the Docker at the end.
Jon: Alright, excellent. Thanks so much to Bret Fisher for this good talk that we’re able to go over. I learned a lot from it even though I wasn’t present in the audience. Thanks for explaining it to me, Chris.
Chris: Yeah, you bet. It’s a very good talk, a laundry list of very much actionable things to go do. I enjoyed this because there’s at least two or three things here, it’s like, “Yup, we got to go and do that.”
Jon: Right. I think the DockerCon talks that you listen to, there’s one more that we might dip into the DockerCon bucket for and do another episode on. That’ll likely come up next week. I’m about to go to GlueCon in a couple of weeks, too. Shoutout to that conference that’s really fun that I’ve been through for a few years now. That’s should be some interesting things out of that one too to talk about.
Jon: Thank you so much, Chris.
Chris: All right. Thanks, Jon.
Jon: Talk to you next week.
Chris: See you.
Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/60. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.