June 26, 2019

66. Using Feature Flags to Increase Velocity and Decrease Risk in a Modern CI:CD Delivery Pipeline

Show Notes
Transcription
Discussion

Chris Hickman and Jon Christensen of Kelsus and Rich Staats of Secret Stache describe using feature flags to increase velocity, but decrease risk in a continuous integration and continuous deployment (CI/CD) pipeline.

Some of the highlights of the show include:

Kelsus: CI/CD pipeline and project pain leads to an imperfect world
Integration: Write, merge, integrate, and test code
GetFlow: Branches in source control system manage and merge code for different environments—production, staging, and development
CD Criteria:
- Effective and automated testing as a system
- Rollback ability (all-or-nothing releases, blue-green deployments vs. feature flags)
Problems occur and make it into production; feature flags reduce risk and limit damage
What is a feature flag? Temporary or permanent conditional statement if code is enabled
Benefits of Feature Flags: Reduce risk by decoupling deploying from releasing, build environments reflecting production, and perform testing directly in production
Disadvantages of Feature flags: Require significant amount of work, code, and design; and difficult to adopt in legacy and ongoing systems
Close off access to certain features on case-by-case basis

Links and Resources

Mobycast 04: The CI/CD Pipeline

GetFlow

Google Cloud

Amazon Web Services (AWS)

Rich: In Episode 66 of Mobycast, Jon and Chris discuss using feature flags to increase velocity and decrease risk in your CI/CD pipeline. Welcome to Mobycast, a weekly conversation about cloud-native development, AWS, and building distributed systems. Let’s jump right in.

Jon: Welcome, Chris and Rich. It’s another episode of Mobycast.

Chris: Hey.

Rich: Hey guys, it’s good to be back.

Jon: Today we’re going to go straight into it. We get to talk about feature flags which are hot. Everybody has already started working on their CI/CD pipeline, that’s less of a thing. People are at least doing something to have a CI/CD pipeline pretty much across the board on new projects as a consideration, I would say. Now it’s like, let’s work on the CD part of the CI/CD pipeline, let’s actually start thinking about continuous deployment on new projects and existing projects. I think a lot folks doing DevOps and working in the public clouds, Google Cloud, AWS, Azure are like, “Let’s take advantage in some of this capability and start to think about whether we can deploy things automatically or at least at the quick press of a button.”

I think today, we’re going to talk about some pain that Kelsus has with one or more of our projects and what we’d like to do about it. Specifically, we feel like some of these pain, we can resolve with feature flags. We’re going to describe what those are and how they’re related to our current CI/CD process and then how they could help us. Chris, maybe you can take what I said and make it more concrete.

Chris: We talked about this modern CI/CD pipeline and what does that mean? There’s really the two pieces to it. There’s the continuous integration part, CI, and then the CD part, that’s continuous deployment. It’s integration and it’s deployment. Integration means we’re writing code. Many folks on the team are writing code. We need to get that code altogether, merged, integrated, and tested. That’s the philosophy behind continuous integration is that that’s constantly happening.

Teams have to decide what kind of workflow they’re going to use. One of the popular ones out there is GetFlow which prescribes a way of using branches in your source control system, and when you do merges and bring the stuff together. For us, personally, we definitely use GetFlow velocity where we end up having branches for each one of the environments that we’re deploying to. If we have three different environments—production, staging, and development—then there are three branches that manage that. It works pretty well but where it gets tricky is when we have issues with the need to take bits and pieces of code and move them from one environment to another one. We’ll get more into this later.

Jon: I just want to add that we did talk quite a bit about this in another episode that we did last year. We just talked about our overall CI/CD pipeline and what it looks like. We’ll link to that in the show notes. For the most part, it hasn’t changed too much. I think we’re looking at doing a new face on it, that’s what this episode is about.

Chris: Good point. Definitely, folks may want to refer back to that previous episode. I just want to highlight there’s the continuous integration part, branching, merging, and how you do that normally. In a perfect world, everything goes really smoothly but in real life, sometimes it can get a little bit tricky.

Jon: And if you could imagine, you do stuff on dev, then you went to stage, then maybe to prod, and I’m done. And then I’ll do that over and over and over again happily ever after.

Chris: That’s the way it always works, it’s the perfect world. It’s like a conveyor belt and it’s all just always going in that flow. The truth matters is it doesn’t work that way. You find out, it’s like, “Oh no, we did deploy the production and there is a pretty big bug that we now have to hurry up and fix. We need to test it first so we need that in staging but what’s in staging right now is actually the next big release that’s not fully tested yet. We definitely don’t want that to go to prods. Now, how do we actually make that hotfix and only deploy that to prod and not all the stuff that’s on tap?” That’s the part that you start getting hair around the situation and where something like feature flags are going to help us out.

Jon: I guess one thing you could is revert staging to where prod is and then fix it on staging and then move staging to prod, but then, it’s such an interruption, right?

Chris: It’s interruption, it’s risky. It depends on where you’re at. It may not be possible. Think about what you’re working on staging actually involve database migrations, so you change schema.

Jon: Maybe the hotfix that you need on prod is not one that can wait like an hour or two; it needs to be on there in the next few minutes.

Chris: You do what everyone else does. You SSH into your prods server; git pull. I didn’t say that. That’s the CI part. The CD part is, now that you’re integrating and now it’s all good, then deployment just means it’s just automatically, it’s just being continuously deployed out there in the prod. That definitely has its issues. There’s actually not a lot of folks out there that do this because it does require a fair amount of sophistication.

For me, the criteria for being able to participate in continuous deployment meant that you had to have a really good testing and automated testing as a system, and then two, you also have the ability to roll back. Those are really more typical all-or-nothing releases, blue-green deployments versus if you have something like feature flags. That actually gives you some more flexibility here and you can have a more progressive approach at deployments where it’s not an all-or-nothing, rather it’s an incremental type thing. We’ll get into that more a bit.

I just want to highlight that we all talk about CI/CD, everyone’s doing it, everyone’s talking about it, but feature flags really factor into this to help out with some of the real world production issues that you have. They can enable you to increase your velocity and also decrease that risk associated with it.

Jon: I want to try to distill the two main problems that I think you describe, that we’re going to try solve with feature flags. The one I heard of was that it’s not always dev-stage-prod dev-stage-prod. We have to be able to get stuff into prod directly or stuff in the dev or stage directly. The other problem that I heard you say is things are not perfect, even after they’ve been tested. Sometimes, especially after they’ve been tested, as hard as you work to create a team that takes great pride of ownership in their work and really exercises everything that they did, the reality is that problems happen and they make their way into production. How can we limit the damage? Those are the two things, right?

Chris: Those are the two primary things, the two takeaways. Those are it. When we have problems with velocity, that’s related to this continuous integration part and the merging. It’s not perfect. We have issues there and that causes us to slow down. We have to do extra work or just do unnatural things. The second part of it is if we’re working on a big feature release and it’s a prod and now having a bunch of users, now they’re seeing that, how confident are we that that’s been really fully tested and what are the ramifications on that? For certain things, it’s not that big of a deal. But if it’s a core piece of functionality or it’s something that the business really relies on, mission critical, the risk is much higher or something like that. With feature flags, we can reduce that risk.

Jon: That’s the moment we have been waiting for. What is a feature flag?

Chris: At the end of the day, it’s definitely not too typically exciting. It’s essentially just the conditional statement. What you’re doing is in your code, you’re just saying, “If this feature is enabled, then go ahead and execute this code path, else do this other path.” It’s really just that simple. It’s just a conditional statement. It is something that you don’t get for free. You have to be thinking about this when you’re writing code and you have to think of your code in the way of should this be enabled or not. If it’s not enabled, then what is it going to do? You need to think of when you’re writing your code in that manner and then actually put in these conditional statements. At their core, that’s what they are. It’s a conditional statement that’s wrapping a particular block of code that you may or may not want to enable based upon some set of conditions.

Jon: That makes sense. That’s easy enough. I’m immediately imagining them all over the place. How much code should I start or stop from happening? How do I even remember where they are or how many there are? I’m just imagining spaghetti as soon as you said this. Maybe you can get into a little bit more, what you’re thinking we might be able to do to keep this manageable, clean, and usable. Do you want to go there? I guess maybe before we go there, we should also talk about, so you do it. Now what do you do with it? We have this code that can be turned on and off. What are we going to do about it?

Chris: There’s a lot to unpack there. I’m a little bit more about feature flags themselves and expand a little bit on that definition and types of feature flags. After that, we can just go through and talk about what does this enable us. We do all this work. You’ve alluded to a lot of the challenges that we have with feature flags. This is not something you get for free. There are challenges with this.

There’s a reason why lots of people don’t have a rich robust system that has feature flags in it. It’s not easy. This is a complicated stuff and it really is easy for the wheels of the bus to come off quickly. We’ll get into that, too, the challenges, like what’s the implementation velocity? How do you go and do this? There’s definitely a crawl-walk-run approach to it. And then, how do you also manage it?

Those are interesting, important topics that we can get into as much as we want to. Maybe we’ll just talk a little bit more about the feature flags. It’s a conditional expression. Definitely at the very least, you define this flag by a name, you have to give it some kind of name or some way of referring to it. There’s that aspect to it.

Optionally, if you’re something that’s more powerful and more sophisticated, you would also provide context to this. Based upon name and context, that is what feeds into your decision in that conditional expression on whether or not this should be enabled. We’re not going to take any context into account here. It’s really just the toggle. It’s either on or off. You may have something that’s just hard-coded or a constant in your code.

You could imagine in your config, you have something like use_v1=true and your feature flag ends up being something like, “If use_v1, do this to the V1 code branch, and then if it’s not, then it’s using the V0 branch,” or something like that, versus context. This is completely arbitrary. It’s up to you on how you want to make these decisions on whether or not to enable the feature flag. It could be like, “Here’s my user context.” What users actually calling it so it could be something about the environment that you’re in, it could be something related to more macro settings or metadata that’s going on on the system. Whatever it is that you want to use, state the information that you want to use in order to make these kinds of decisions.

Jon: I can imagine a feature flag turning on a feature for a certain set number of users, or named users, or only during times when traffic in the system is low. All kinds of cool stuff.

Chris: Absolutely. It can be super powerful. We have these conditional expressions. They’re defined by a flag name and optionally some context. These conditions are either basically static conditionals or they’re dynamic. We talked about the static part. This is hard-coded and it’s based upon some constant or configuration, not really changing. It’s basically a toggle, versus dynamic is like, “I’m passing on this context at run time, in real time, making these decisions on whether or not this code path should be taken.”

Another thing to point out is that feature flags can be thought of as either temporary or permanent. It may be like you’re working on a new version, V2 of something with the intent that you’re going to deprecate V1. With that, you may have a feature flag in there for V2 versus V1. At some point after you’ve fully migrated over to the V2, this flag doesn’t makes sense anymore. You just want to go in and delete it all. That would be an example for a temporary feature flag. A permanent one might be something that’s going to be there forever and maybe you have an advanced feature that’s only for a certain set of users. That’s a feature flag. That’s going to be something that’s probably going to be there in a more permanent basis.

Jon: It totally makes sense.

Chris: That’s feature flags in general. What are some of the benefits that we […]? Why would we do this? Maybe as we go through and talk about this, we can also just talk about our own experiences—real life, day-to-day—of the projects that we’re on, the software that we’re deploying, and some of the real-world challenges that we have. Personally, some of the projects that I’m involved with, I really wish that we had more capabilities in this area. Some of the reasons that we talked about, the challenges that we have with the continuous integration, the merging, and then also the continuous deployment, the risk of deploying new features, these feature flags would be very beneficial to us.

With that, as far as what can you do with these feature flags, one of the really big things that you can do is you can reduce your risk here. Primarily, what’s happening is you’re decoupling deploying from releasing. That’s a big deal. You can deploy the prod, the code, but it doesn’t necessarily mean you’re releasing that code. Maybe no one has seen that code yet, it’s still dormant, you haven’t activated it. You’re decoupling that. That gives you a lot of flexibility and a lot of power.

Jon: I want to do that on a couple things that are coming up. I’m thinking a […] a couple of features that we’re about to release that are just world-changing features. If something goes wrong, it doesn’t look good. That’s how everybody sees it. As much as retest, I can’t be 100% confident that it’s perfect until people actually use it in production.

Chris: With the reducing of the risk, it allows you to segment your users, too. You can deploy that and maybe you have thousands of users, or 100,000 users, but you can set it up so that it’s only a small subset and these folks are going to see it. That’s going to be part of that context in the feature flag and the conditional, and now instead of thousands and thousands of people seeing it, maybe it’s only 20 or 50 or 100. You’ve really reduced the risk there by being able to control on a user basis, like who gets to see this.

One of the other things that helps reduce risk is that we only have the challenge of testing especially around data. We try to build environments that reflect production. You can go so far as to take snapshots of production databases and restore them to other environments and what not but that’s kind of problematic.

Jon: And may not be allowed in some high-compliance environments.

Chris: Absolutely. It is a challenge to be able to do testing in a staging environment that really limits production. The only thing that really limits production is production. Feature flags help reduce this risk by allowing us to actually test in prod which sounds crazy like, “What? We want to test in production?” But when you’re minimizing the scope of this and who’s seeing it, it turns out on its head and gives you a lot of flexibility and power there while reducing the risk and the impact of that. There are some implementation details and challenges that we are glossing over a bit, but in general the principles is absolute solid and applies there.

Jon: Absolutely. This feels like it could be a lot of software to write. If you’re doing this and you’re really doing it well, you’ve got an admin tool that’s letting you go in and say who’s going to be able to see the thing and who isn’t or you’re going to be able to do things like control percentages of who gets any feature. It’s like, “Hey wait, we’re already pretty busy doing stuff, meeting the needs of the business. I don’t really have time to write a feature flag system.” What do you say about that?

Chris: This can be very sophisticated, especially if you want to get the most power out of it. It’s a lot of work for sure and it’s a lot of code. It’s a lot of design. Just really thinking things through, like what is the feature and what is a feature that should have enablement versus disablement? How do you name the things? How do you manage these things? Is it a config file? Is it database? Is it a full-blown feature management service? These are all real world things that you have to consider.

Again it’s crawl-walk-run. I think there’s value no matter where you’re at to just start thinking this way and starting simple. There very much is a standard evolution for everyone that does work like this, where almost everyone starts off at the beginning with like, “I’m just going to have a feature toggle. It’s going to be basically hard-coded and if I’m going to change it, it just means a code change and redeploy.” That’s how you start.

If you have a certain feature that you’re not sure about or it’s a bigger feature and you just want to employ this technique on it, then go ahead and just do it as a feature toggle. Be very strategic so you don’t have to go and create a hundred feature flags or feature toggles. Start with one and then maybe it goes out to a handful or something like that. That might be what you do for three, six months and just kind of get your feet wet on that. And then you can go from there.

I think it’s much more difficult to adopt these feature flags in legacy systems and ongoing systems. It becomes much more work just to do the refactoring and to put these things, to decorate your code appropriately versus if you’re starting from scratch. Obviously, this becomes a lot easier. When you are starting from scratch, I think you have a lot more flexibility and a lot more leeway to think about this more long-term and to maybe put in more the plumbing for doing that if you do decide like, “Hey, this is a design philosophy I really want to adopt.”

Jon: I have two things that I want to get at. One is, I’m a little surprised that when I talked about how you’re practically building a whole other product just to manage your feature flags once it gets fairly sophisticated, I was a little surprised you didn’t mentioned that there are commercial products out there that help you do that.

Chris: Yeah, there are companies that have sprung out to go deal with this issue. LaunchDarkly is one such startup; came out at some of the folks that worked at Atlassian where Atlassian made heavy use of feature flags.

Jon: […] Atlassian made heavy use of feature flags.

Chris: Obviously they had their own internal tools and services that they built to do this like, “Hey, we should go turn this into a product.” I’m sure it was the thought process.

Jon: I met a couple of other folks from LaunchDarkly at Gluecon a couple years ago and they were good people. I would say no tool can solve the problem of making the decisions of what part of your code to run and what not to run, and understanding the context around that decision. All the things like LaunchDarkly can do is give you a nice view of what’s running, what isn’t, and being able to turn knobs on that.

Chris: It’s a great point. Maybe more bluntly to say, the hard work is on your back. These tools are not going to be able to do the hard work for you. You have to do it. They just make it so that some of that undifferentiated heavy lifting is done by other people—to borrow AWS’ terminology—but you have to do the hard work of, again, the design. What should be a feature flag? How am I going to make my decisions? What’s that context? What are expiration policies on it? How am I going to do segmentation? How am I going to manage who’s allowed to change these feature flags? That’s a big, big area that you have to deal with as well, just the administrative policies around it. If you do something like, “I’m going to change this feature flag, […] that to 100% of the users instead of it really should only go out to 2%,” that could cause major problems.

Jon: In LaunchDarkly’s defense, there is a general pattern that maybe a lot of products might use. If your context is typically going to be users or user sessions, then you might be able to say, “Hey LaunchDarkly, here’s where you go. Find out who’s logged in at the moment.” That integration is expected and assumed to maybe a little easier, but still it doesn’t get in your code and turn on and off features. It doesn’t know what your feature is […].

Chris: And just because you have to do the hard work doesn’t mean that something like LaunchDarkly shouldn’t be part of that. If you do get to the level of sophistication where you do need a feature management service, then you should really think long and hard about whether or not you go roll your own versus […] something like LaunchDarkly. This is why they exist, they’ve gone and raised almost $80 million, I think, in investment capital. They’ve been around since 2014. You should be looking to leverage that infrastructure that’s there. But again, just realize they can’t do everything for you. There’s nothing changing the fact that you have to go and make code changes. You can’t use LaunchDarkly without doing code changes.

Jon: I said there are two things that I still wanted to dig into. The other one was just something that occurred to me at the beginning of the conversation. You want to close off access to certain features. A lot of times, if your software is user-facing and there’s something you don’t want users to use yet because it’s new, you may be able to put your feature flags in the UI. I was wondering if that’s a no-no or an okay thing in your mind. Basically, don’t show this UI if the feature is turned off. Then, maybe the whole back-end API is on and available to everyone and who cares because only the people with the client that can see the UI that activates that back-end are able to actually exercise it. Any thoughts on that?

Chris: I think it’s just totally up to you on a case-by-case basis on what makes sense. If you have a typical microservice architecture or you’ve got front-end JavaScript clients like Reactor or whatever like that, you have a back-end RESTful microservice that’s implementing an API that’s called by those clients. If those are the only API subscribers that you have to that, then maybe you just feature flag on the front-end so that the JavaScript is not making those calls and you don’t worry about feature flagging on the back-end on whether or not you handle those calls.

By design, no one is going to make those calls unless it’s your front-end UI clients. You probably save yourself quite a bit of hassle by doing that way if it’s that clean. If you have other clients or if it’s more of a public service and public API, that’s a different story. I think it’s on a case-by-case basis. I would err on the side of what’s most pragmatic for you.

Jon: I am sitting in an armchair so I think I’ll do a little bit of armchair architecture. It would seem to me that if you’re approaching this and you’re just doing crawl-walk-run, then a good principle might be to try to limit the amount incurred that’s flagged off. Keep it small. Try to keep it in as much as possible known and expected places. If there’s a place where you have all your front door code, if you can limit it to like, “The feature flag turns on and off. These are just two or three lines of code, not this huge 500 line method, and it needs to be at the beginning and in the end or at the beginning and the middle.”

If there’s a way you can organize your code to where the feature flag is very clearly turning off a tiny part of it and turning on a tiny part of it, it seems like it’s going to be easier for everyone to understand. I’m just saying this because people might be out there thinking about how they’re going to implement this on their own projects and it seems like it could be useful to think about that.

Chris: I think this goes to things like modularity and refactoring. Ideally, you’d be doing this at the function level. If flag enabled, call this function. If not, then call this function-type thing. Again, maybe it’s just the function call on the body of these conditional blocks. That’s where you’re implementing the different aspects in those functions. Those should be refactored and maybe they actually end up using a lot of the exact same code. They’re further refactored but the differences are entailed on that. The code ends up reading really nice. It’s like, “Here’s where it goes to do a new feature and here’s where it is for the existing or the fallback feature.”

Jon: The other advantage that you get from that is not just modularity. As much as possible you don’t want to have multiple huge different code paths with different logic, all that’s alive and active in doing things because that’s more surface area of things to go wrong. As much as your code, that should be the same. I don’t know a better way to say that, but just don’t duplicate code where possible.

Chris: Yup. Definitely think about what’s your policy, what’s feature-flagable versus what’s not. It gets really tricky quickly if you start feature-flagging within feature flags. You definitely want to be thinking that through. If you start finding yourself at the top level, you have two options. Each time that you have another feature flag within those, it’s a binary tree. Before you know it, maybe that’s 16 different code paths and it becomes pretty messy pretty quickly. What’s that level of granularity for what is a feature flag that’s atomic unit?

Jon: Right, Chris. That just reminded me of a talk you and I both went to in Denver a year or two ago with Strava when they talked about when they used feature flags. That was really cool and interesting. I think it’s worth repeating here.

When they build a new feature and they’re not sure if it really is good enough for this core product and they just want to get a sense if people like it or not, they don’t want to subject the development of that feature to the stringent code requirement they have for all the rest of the stuff inside the Strava application. They just want to let people crank something out, very lightly tested.

I can’t remember the word for this. It’s just essentially a lean process, just getting in front of a few people and see how they like it, but in the production code base, real Strava and not some beta version of Strava. They would use feature flags for that, turn them on for just a few people, get feedback, turn it off, and then rewrite the code using their development and engineering guidelines. Throw away all of the code and start over. That was a way of them controlling technical debt but also still being able to be very lean and fast.

Chris: It’s really similar to AB test and vetting that feature on whether or not this is getting user feedback. It becomes a very low risk, easy way of doing it. It gets very, very common. So many big companies do this. Netflix does it constantly. All the bigger companies, this is what they do. They have millions of users. They want to know like, “How is this new thing? How are people going to react to it?” Rather than making that change available to everyone, have it roll just a few people, from that experiment take you measurements, and then make decisions based upon that.

Jon: I hope that we find our way to solving a couple of problems that not having feature flags has given us over the next month or two here.

Chris: Yes. We’re in a situation where we have a system that we’re working on for three plus years, it’s very complicated, a very large code base, but having feature flags in there would really help out with some of the challenges that we face on a day-to-day basis. That’s going to be the fun problem for us is can we leverage feature flags more?

Jon: And with the legacy code base, too. Well great. Thank you so much for talking about this and thanks, Rich, for putting us together.

Rich: Of course.

Chris: All right. Thanks guys.

Jon: Talk to you next week.

Chris: See you.

Rich: Well, dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/66. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

66. Using Feature Flags to Increase Velocity and Decrease Risk in a Modern CI:CD Delivery Pipeline