June 13, 2018

14. Stop Worrying About Cloud Lock-in

Show Notes
Transcription
Discussion

At the recent Gluecon event, a popular topic centered around how to prevent Cloud Lock-in. Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache discuss why you your time is better spent focusing on doing your job. If/when Cloud Lock-in becomes an issue, you will have the resources to deal with it.

Some of the highlights of the show include:

AWS Fargate is ‘serverless ECS’. You don’t need to manage your own cluster nodes. This sounds great, but we’ve found the overhead of managing your own cluster to be minimal. Fargate is more expensive than ECS, and you have greater control if you manage your own cluster.
Cloud lock-in was a huge concern among people at Gluecon 2018. People from large companies talked about ‘being burned’ in the past with vendor lock-in. The likely risks are (1) price gouging and (2) vendors going out of business.
Cloud allows people to deploy faster and more cheaply than running their own hardware, as long as you don’t have huge scale. Few businesses get large enough to need their own data center on-prem to save money.
Small and startup companies often start off in the Cloud. Big companies often have their own data centers and they are now migrating to the Cloud.
AWS does allow you to run their software in your own data center, but this ties you to AWS.
There is huge complication and risk to architecting a system to run in multiple cloud environments, and it almost certain wouldn’t run optimally in all clouds.
We think the risk of AWS hiking prices drastically, or going out of business, is essentially zero.
If you were building a microservice-based multi-cloud system, some of the difficulties include: Which cloud hosts the database? How do I spread my services across 2 clouds? What about latency between cloud providers networks? How do I maintain security? How do I staff people who are experts at operating in both clouds?
It’s clear that lock-in is a real fear for many companies, regardless of our opinion that it shouldn’t be such a concern.
Jon thinks the fear of lock-in may drive cloud providers toward standardization; Chris thinks AWS doesn’t have a compelling reason to standardize since they’re the industry leader.
Our advice: as a small or medium size company, don’t worry about cloud lock in. If you get big enough that it’s really a concern, we recommend building abstractions for the provider-specific parts of your system, and having a backup of your system ready to run in a 2nd cloud provider, but don’t try to run them concurrently.

Links and Resources

Rich: In Episode 14 of Mobycast, Jon and Chris explained why you should stop worrying about cloud lock-in. Welcome to Mobycast, a weekly conversation about containerization, Docker and modern software deployment. Let’s jump right in.

Jon: Welcome, Chris and Rich. It’s another Mobycast.

Rich: Hey, guys.

Jon: Did they hear your voices? We’ve got a lot to talk about this week. I wanted to cover a little bit that we left on the table last week and some new stuff. We’re going to skip with the fun part where we talk about what we did last week and just jump right in. Last week, we talked about server lists and we mostly spent our time talking about Lambda and then, after we finished recording, I was like, “Hey, Chris, I don’t really know anything about Fargate but it sounded really exciting.” Chris, let me just say that to you again but this time on air. Chris, I don’t really know about Fargate but it sounds exciting. Are we going to start using that?

Chris: That’s an interesting question, Jon. I haven’t heard that one before. It’s interesting because it definitely builds on the whole server list thing. Fargate is essentially a server list EC. Currently, ECS is the orchestration platform offered by Amazon for running your Docker containers on a cluster of EC2 nodes. We use that quite heavily and, given that all of our software runs in containers, Fargate is something relatively new.

It was announced at re:Invent this past November and had been in beta for a while. It’s now actually alive in multiple regions, and what it is, is it’s basically all the benefits of ECS but you no longer have to manage your cluster nodes anymore. You basically say, “Here’s my container and, AWS, you take care of it. You find the hardware for it, you run my container, and I don’t have to worry about that.” I don’t have to worry about scaling out a cluster or patching my servers or anything like that.

Jon: That sounds awesome. Why don’t we just do that?

Chris: It does sound awesome if those problems that it solves were actual, real big problems. I would just say, from the experience of having used ECS in production now for two and a half years–I think it is–I can say that one of the great benefits of ECS and containers in general has been the actual management overhead of dealing with your cluster resources is actually really minimal. I’m now at the point where I probably spend maybe once a month or maybe every other month doing some maintenance on my ECS, the cluster resources.

It’s very little overhead so there’s a little bit of time to get it set up but even getting it set up, once you understand the details of how to do that, it’s half a day to go from zero to hero with ECS. As another example, last week, we had four different clusters set up in ECS, all with many, many host nodes inside each one of those clusters. We wanted to upgrade to the latest AMIs to bring in new security patches and also get new ECS agent software on there and then swap out some of the stuff that we do in initialization of hosts and our user data script.

I ended up making those changes. I basically shot every single one of those hosts in the head, rebuilt brand new ones, spun them all up, and that was probably a total of four hours to rebuild four clusters with brand new machines with all the latest and greatest. In about four hours, every two months, not much on investment and then the benefits of running your own ECS clusters are – one, it’s definitely more economical. For the most part, it’s going to be more expensive using Fargate than it is running it on your own ECS cluster hardware.

Jon: It blew my mind when you said that because I thought, “Wait. If Amazon gets to decide how they use their hardware and they don’t give you any access to it, then shouldn’t that be cheaper? Can’t they do some optimization at lower prices?” but apparently not.

Chris: Yeah, and who knows what the strategy is for pricing and whether this reflects the actual cost for them? It actually is more expensive for them to offer it this way or if they’re viewing this as, “Hey, we’re adding more value here and this is worth a premium so we’re putting a price on our premium.” Time will tell whether pricing comes down and that inflection point happens where it’s actually more affordable to run on Fargate that it is on your own defecated hardware, but you can also do great things like your ECS clusters can take advantage of spot instances and reserved instances. There’s lots of stuff you can do there.

I’m sure that if cost is your issue, you’re going to win out using a normal ECS as opposed to Fargate. You also have a lot more control and insight over just the service capabilities of what’s it running on. Do you have warmup time when spinning up containers in Fargate kind of like you do in Lambda? Are you queued behind other folks like is it in another services? Are you going to get a more jagged response time pattern to your jobs versus if you’re running your own cluster? Do you know exactly what’s running on there? You have much more insight into that. There’s also issues with application architecture, and sidecar containers, and logging, and making sure all of that works. For the time being, it feels like a lot of that; you just have a lot more control. It’s a lot easier. It’s a lot more integrated when you’re running it on your own ECS clusters as to Fargate.

Rich: Hey, this is Rich. You might recognize me as the guy who introduces the show but is pretty much silent during the meat of the podcast. The truth is these topics are oftentimes incredibly complex and I’m just too inexperienced to provide much value. What you might not know is that Jon and Chris created a training product to help developers of all skill sets get caught up to speed on AWS and Docker. If you’re like me and feel underwater in these conversations, head on over to prodockertraining.com and get on the mailing list for the inaugural course. Okay, let’s dive back in.

Jon: That was a downright commercial for ECS. We should get some primetime spot and put it on there. That is a good segue into the rest of our conversation, which I think I’ll call How I Learned to Stop Worrying about Cloud Lock-In and Just Do My Job. ECS, obviously is very specific to Amazon, AWS. If you use it, you’re a little bit locked in and all the other things that you might use on AWS lock you in further and further.

When I was at Gluecon a few weeks ago, that was weird. I brought this up a few weeks in a couple of podcasts ago, but I just could not have a conversation with anybody that worked at a company of any size whatsoever and not have the idea of cloud lock-in come into the conversation. I’m not even interested in talking about cloud lock-in but somehow everybody was talking about it. Every single talk featured it in some way.

All these companies were talking about how they had features in their product that helped prevent cloud lock-in, and you might as well call it the Cloud Lock-In Con. Just to get to this conversation a little bit and go a little deeper, Chris, when we talked about this before, you talked about one of the ways that people arrive at the cloud and leave the cloud. Maybe you can tell us that story again.

Chris: I would say, for the most part, when starting out, whether you’re a startup or a new group that’s inside of a bigger company–taking advantage of cloud makes a lot of sense because it’s very quick startup time. You can have servers up and running in a matter of minutes as opposed to months if you have to go buy your own. Also, just from a cost standpoint, it’s so much more economical to run that workload in the cloud.

Running it in the cloud, you also get a lot of value-added services, all the various integration points and all the features that are offered by all the modern cloud providers. It ends up being a pretty fertile place to get going, ramp up, get to a level of scale. I think for a lot of folks, they never get to the point where they’ve been so successfully grown so much where it now makes sense for them to think about getting off the cloud because there is an inflection point.

At some point, you scale up enough, you’re using enough resources in the cloud where you’re now overpaying, and it’s going to be much cheaper for you to run it yourself, to go buy those servers, to have your own datacenter and to run it on PRAM. That’s that inflection point. If you do get successful enough, if you’re running enough requests through your system, you have a high enough load to justify it, then that’s when you take that hard look at it and say, “You know what? It’s time to have our migration strategy to start moving this off of the cloud and onto our own on PRAM datacenter.”

Again, we talked a bit before like if you’re at that point, you’re so successful, the capital is there and the financial incentive is there that you’ll do it. It’s hard work and it’s going to take a while but the financial incentive is there, the capital is there, and you’re going to do it. Don’t worry about lock-in from the get-go because it just complicates everything and, chances are, it’s not going to be an issue for you. If it is, then that’s actually a great thing and you should be really happy about it.

Jon: From that perspective, cloud lock-in doesn’t really matter because no matter which cloud you might have chosen, getting it off the cloud, there’s no AWS on PRAM, or Google Cloud on PRAM, or Azure on PRAM. You literally have to decouple yourself from whichever cloud you chose. You’ve got to do that work no matter what. No strategy of putting yourself on both Google and AWS would have helped you.

Chris: There’s a little of bunion there where AWS does have some stuff, where you can actually run their software on PRAM but it always requires that tie-in. At the end of the day, you’re still giving money to AWS. To actually be fully divested from that, then you have to come up with that yourself.

Jon: Then, there’s another path. Chris just talked about the path where you’re maybe a startup or you’re a young company and you started in the cloud. Then, there’s this other path that I was seeing quite a bit at Gluecon, which is lots of telecommunication companies, cable companies, healthcare companies, big monopolistic-almost types of companies where they’ve been on PRAM forever and then they started to get really awful at it.

Now, it takes weeks to requisition a new server, putting up a new environment is just like a headache involving 15 layers of management, and they just can’t get anything done. They see the cloud as a way to get back to some level of agility, and I think this does slightly affect a certain kind of company that maybe is not facing very much competition in their market and they’re able to get to this point without getting crushed by competitors that are more nimble and agile.

These companies, they might be CenturyLink-type companies or Charter Communications-type companies. Now, they want to get on the cloud and management all up and down the line is saying, “Well, make sure we don’t get locked in. We don’t want to have that happen to us again,” and I wasn’t sure. A lot of people had talked about being burned before. I should have dug more in. It was a matter of me not even being that interested in cloud lock-in and realizing later that every conversation had involved it. I don’t know how they actually got burned before. “You’re still here. You’re making a lot of money. How is it that you were burned from some kind of lock-in in the past?” Do you have any guesses of what that might have been, Chris? I’m just speculating at this point. You will be, too.

Chris: I would guess that it’s vendor lock-in in general. These are all companies with probably long histories, lots of IT. They’ve gone through many cycles of technology and dealt with many vendors. Just the length of time, absolutely, they’ve worked with vendors before that then go belly up and then they have to scramble. This is very much a strategy for reducing risk. Don’t have a single dependency; have backups and multiple vendors providing that particular functionality that you need so that if one does go away or if one does raise prices to a level that you’re not comfortable with, then you have the option. I think that’s where that’s coming from.

Jon: I think you’re right. Let’s crush that notion. In your opinion, Chris, what is more risky: having a strategy of deploying to multiple clouds with the same functionality or picking a cloud?

Chris: I think for almost any company, the idea of building a system that can actually run a hybrid cloud environment is so complicated. It’s so much overhead and, chances are, it’s not going to work optimally. It definitely won’t work optimally–I’m very skeptical of that–and whether it works at all, even if it works adequately, would be a challenge in and of itself, I think. I would say there’s probably a lot of risk there.

These companies are so big that I think the two concerns might be pricing is going to be so unfavorable that it feels like robbery or they’re just not going to be available anymore; they’re going to go out of business. I think you just have to look at your cloud provider and really ask yourself how likely is that going to happen. Amazon, as a company, they don’t make much money. They’ve definitely prioritized on growing their market and they’re going to make money based upon volumes.

Their idea is land and expand so I don’t see a lot of risk in them saying, “Okay, we’re going to go and double, triple, quadruple pricing for folks.” That just does not feel like that’s in their DNA and what they’re trying to do. It also would harm their business; they would see a dramatic downturn and, for them, it’s all about scale. They’re making such huge investments in datacenters and computer equipment or whatnot.

If they see a falloff in usage, then they have a bunch of stuff to sit around, not being used, and that’s fairly costly. It’s not going to behoove them to go raise prices that they alienate customers. For them, going out of business, again, ask yourself how likely is that to happen. If AWS went out of business, forget it; our economy has just collapsed. There’s much bigger problems than your company at that point. Literally, just think about that. If AWS went away, what would happen? It would be the equivalent of setting off a bomb in a major financial center or something like that.

Jon: Just to pick on one thing that you talked about there a minute ago. You said it can’t run optimally so I just started to imagine. What might you try to architect that’s multi-cloud that runs across clouds? The first thing that came to my mind was, “Well, maybe have some micro-services and they’re in your desk and you put them on Kubernetes clusters.” You’ve got a Kubernetes cluster over in, Google Cloud in another one and in AWS. The first thing that came to my mind is, “Well, where’s the database?” If you’re going to have a persistent source, is that going to be on AWS side or is it going to be on the Google side? Whichever it’s on, the other one is going to suffer from some latency issues. That was one thought that occurred to me.

Then, the other thought that occurred to me was, “Okay, let’s say each micro-service has to live in its own cloud but we’re going to sprinkle our micro-services around. These two micro-services will be on Google and these two micro-services will be on AWS.” Again, if those micro-services need to talk to each other, there could be some weirdness and latency issues, and you’d want to take advantage of certain security things that you get like VPCs and things like that but also, now, you just have to have teams that are knowledgeable about both.

At one level, it may seem like it’s just compute and it’s just DNS and load balancing, but each one of those clouds, even if you’re using Terraform, you have to know something about the API or something about how to navigate the management console, something about specifically how it works, and how monitoring works, and everything. You just more than double the requirements on your staff in terms of what they need to understand, in terms of how their infrastructure runs.

Chris: Absolutely. There’s multiple facets there of the complexity of saying, “I want to run on a hybrid cloud environment,” so application architecture becomes a big issue, like, “How do I do that for some of the exact same scenarios you just talked about?” There is the operational ability of it like monitoring is going to be different in Azure than it’s going to be in AWS so you have to build that abstraction on top of that to give you that insight as a whole.

Then, you have all these totally different cloud providers. Seven plus years of AWS experience and I’m still trying to keep and learn about AWS. Throw Azure on that and throw Google Cloud on that and it’s like, “Whoa, that’s a lot.” You need to hire more people. It’s all in the name of saying, “I’m not going to be locked into a cloud provider.” If that were me, I’d have to take a step back and say, “What’s the worst evil here?”

Jon: Now, having said all this and having made a strong case, I think that we shouldn’t be worried about cloud lock-in. There is a reality that people are and, as fascinated as we might be and as persuasive as we might try to be, we can’t convince them to not worry about this. Companies are putting it at the top of their list of things that they’re concerned about. I think that just realizing that this is happening and realizing that this is a hot topic is important and I think it’ll have an interesting effect on the market. My opinion is that this’ll have a commoditizing effect on the market and I think we’re already seeing it today. It’ll make APIs more standard across clouds. It’ll make the names of things and the hierarchies of features more standard so it could have overall benefits to everyone, this market pressure. Your thoughts, Chris?

Chris: Maybe although in this landscape, with AWS being the clear leader and they know that both Google and Microsoft are breathing down their neck, they want to try to get market share on them. I just don’t see how Amazon sees it to their advantage to work together with those to facilitate that stuff so I just see people like Amazon innovating faster and faster and the others playing ketchup. To then try to have industry standards for these things and to operate, it becomes an intractable problem.

Jon: That’s also a good way of looking at that. We’ll have this conversation again a year from now and see what happens.

Chris: Maybe the takeaway is, definitely, my suggestion would be don’t worry about cloud lock-in. When it becomes an issue, you’re going to have the resources to deal with that. The other thing is if it really is something that’s concerning and you need it just as part of an organizational buyoff, I think the best approach to that would be to use the proper level of abstraction such that you can have a backup datacenter in a different cloud provider.

Don’t try to run it concurrently; instead, just go build your application for one and have it hosted in one cloud provider. Know what things that are cloud-specific that you are using. Those are the things that you would then look at abstracting out so that if you did need to say, “Okay, I’m not using that cloud provider. I’m going to another one,” you could do it. That would be, again, a lot of work, but if that’s what you need from an organizational standpoint, then that would be my suggestion for going about it.

Jon: Great suggestion. Thanks again. This has been a fun one. Thanks, Rich. Thanks, Chris.

Rich: Thanks, guys. See you.

Chris: Later.

Rich: Well, Dear Listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources, is available at mobycast.fm/14. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

14. Stop Worrying About Cloud Lock-in