June 6, 2018

13. The Future of Serverless

Show Notes
Transcription
Discussion

What is the future of serverless? What is it good for? Who should be using it? Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache share their thoughts about this current trend. And they may not be on the same page when it comes to being serverless or not.

Some of the highlights of the show include:

Definition of serverless: Running your code without any need to provision computing resources. Someone else manages those resources entirely, and you care only about the software.
AWS Lambda was the first cloud serverless offering, and the beginning of the buzzword. The serverless concept has been extended as an option for AWS Aurora database, and for ECS as AWS Fargate.
Serverless seems like the next step for those coming from platforms as a service (PaaS) such as Heroku.
Core difference between PaaS and Lambda is the amount of control available. With Heroku, you can get under the hood and manage some of the resource configurations. You can’t do that with Lambda.
AWS Lambda promises magic; “deploy your code without configuring any servers and it will automatically scale to any workload, with limit.” In reality, it requires proper configuration and does have finite limits based on cloud capacity and your software architecture.
Lambda is a good fit for event-driven systems, especially for simpler/smaller functions where the overhead of wrapping the function in a service would be overkill.
Lambda serves as a boilerplate framework for running a unit of code on the Cloud in response to some event or request.
Event vs. API request: an event is generated by other services that don’t need synchronous responses; an API request is synchronous.
Common Lambda errors include run-time & scaling; you have to handle these whether using Serverless or traditional deployment.
Another class of error is time-outs (5 minutes on lambda). You must do extra work for this on Lambda that you wouldn’t need on traditional deployments, because you won’t receive any error or stack trace when this happens.
Spawn limits constrain how quickly your Lambda function can scale up to higher request volume.
Whether on Lambda or not, you still need to deal with the same set of problems associated with running production software in a distributed system.
Serverless is not a silver bullet; it solves some problems, but there are complicated issues at play that you still have to manage.
Lambda feels like going back to punch cards and mainframes; you give someone the work, and they come back with the results.
If there is an influx of Lambda usage, there will be an evolution of frameworks and third-party support.

Links and Resources:

Kelsus
Secret Stache Media
PRO Docker Training
AWS Lambda
Amazon ECS
Stackery
Nate Taggart’s Gluecon Self-Healing Serverless Applications presentation

Rich: In episode 13 of Mobycast, Jon and Chris discussed the future of serverless. Welcome to Mobycast, a weekly conversation about containerization, Docker, and modern software deployment. Let’s jump right in.

All right, welcome, Chris and Rich, it’s time for Mobycast and every week, I have said the number, but this is like after they were done with the iPad 2, they decided to call the new iPad just iPad. I have lost track of the number and from now on, it’s just Mobycast. Welcome.

Chris: Hey, how’s it going guys?

Jon: It’s good, what have you been up to this week, Chris?

Chris: That’s a great question actually, today has been kind of the unusual week, and that has just literally been focusing on one thing, just the project that I’m leading up. Just doing a lot of driving out ambiguity from various ideas and turn it into features that we can act upon and then go execute against it.

Today’s been no travel, no fires, no nothing like that, just a pretty straightforward week and those seem to be rarer and rarer, so I’m thankful for that.

Jon: Excellent and how about you, Rich?

Rich: This month was actually a record breaking month for Secret Stash and actually the month prior was as well. We had two back-to-back that had a real nice growth as a result of this outbound strategy that we’ve been doing since January and the other side of that is, I actually have been outside more than—out of my computer, I hit some golf balls and then two days ago and we went on a hike. In the wake of that, I’m also finding myself with a little bit more balance than I used to. This week wasn’t very productive, but it was nice to actually me enjoy that reward a bit.

Jon: Yeah, speaking of outside, this week I got a chance to try out the new river wave feature that they’ve built in Eagle, Colorado. I got my surfboard out a couple of times and surfed on it, it’s super fun and I’m so happy that it’s just a few blocks away. For the surfer in me, it’s like a welcome that I can do that in Colorado.

Unfortunately, though this year, we did not get enough snow so I’m already sad because that we’ve passed peak runoff and it looks like my river surfing is going to be limited to maybe another week or so.

But in other news, on the work front, hopefully by the time you hear this, I’m sure of it, by the time you hear this, there’s going to be a real class on prodockertraining.com that you can buy, and you can come, and you can attend, and you can learn, and you won’t just have to listen to us talk about this stuff. You can actually be with us in person and work with us directly.

Rich: Where is that workshop?

Jon: We’re going to start doing these in Seattle. We may eventually also do some in Denver, I’m not sure yet. But absolutely for sure, in Seattle.

Rich: We’ll link you up in the show notes for a direct link to that.

Jon: We talked last week about all of the things that happened at Gluecon and sort of the state of the world when it comes to modern software deployment, containerization, and DevOps, and the conversation took longer than 20 minutes.

We didn’t get everything and one of the things that’s just a hot topic is serverless, and we wanted to talk about it and really dig into it some more and so we thought we’d spend the entire time this week talking about serverless.

I think today what we can try to do is Chris and I can try to talk about serverless, hopefully it’s been clear from our previous conversations that we’re maybe not entirely on the same page, we maybe don’t totally agree, which makes for a fun conversation. I want to figure out where the boundaries are, what is serverless good for, what is it not good for. Who should be using it, who shouldn’t be using it.

Even if people that are using , t but shouldn’t be, so there’s all these people that are using it but should be, what’s that going to cost? What’s going to happen with serverless if there’s this influx of people that shouldn’t be using it that are.

To get started, I’ve said this word ‘serverless’ a bunch of times and if you haven’t been listening over the weeks, it might be worth just getting a little refresher of what that is. Chris, can you tell us what serverless is?

Chris: That’s a great place to start because that was actually the very first thing that came to mind. We first have to have a definition of what serverless means because I think it’s definitely mutated over the years as just things above all.

In general, serverless means that you you’re running your code on servers that you don’t have to manage. That’s where this serverless part comes in.

There are still absolutely servers, you can’t just run bits of code in the ether, they actually have to run on computers–servers; someone has to manage it, but when we say serverless, usually we mean that someone else has the headache of managing that infrastructure and not you.

They basically are providing computing resources as a service and your code goes on them. In the most general sense, that’s what serverless means.

I think early on, I think it was around 2013 when AWS came out with Lambda, and that was the beginning of the buzzword serverless. At that point in time, serverless really equalled Lambda but we’re now starting to see the word serverless apply to lots of things like RDS Aurora–now has–they call it a serviceless option.

ECS now has a serverless option called Fargate. We’re starting to see just this word leak out into just any other scenario where you’re not managing the service yourself, someone else has that headache of how do you provision those and scale them and just monitor, troubleshoot, deal with them, apply updates and whatnot.

Jon: Thanks for that. I’ll also say it’s been my experience that a lot of the people that jump into serverless are ones that maybe came from platforms as a service, some folks that have done some monolithic python apps, and Rails applications, and maybe used Heroku. This is a really attractive looking option because it seems like the next step on the general trajectory.

Heroku is platform as a service, all you have to do is just tell Heroku, “Here’s my Get Help repository, go deploy it.” Lambda feels sort of similar, “Hey Lambda, here’s my function, go deploy it.”

I think one of the core differences though is that with Heroku, even though you weren’t necessarily managing a server, you didn’t have to do a lot of management of the workload that the servers are capable of handling by turning on workers and turning them off.

But also, Heroku didn’t prevent you from getting in there and sort of seeing what was going on. With Heroku, you could for example with the Rails application, you could still run a Rails console which is like a live terminal into your server, so you could poke around, run functions, see what’s happening in terms of memory usage and a few other things that with Lambda, it’s not the same.

Lambda specifically and this is not Fargate or Aurora serverless, but with Lambda specifically, you call a function, the function executes on servers that you don’t have any control over, and you get a return value, or a timeout, or an error, and that’s it. You don’t get to log in to Lambda and see how it’s doing, there’s nothing like that whatsoever.

I think that’s a real core difference between platform as a service and Lambda specifically in terms of serverless architecture and running applications that may be catches some people off guard.

Before we continue, I wanted to talk about a talk that was at Gluecon by Nate Taggart who’s really excited about serverless and he’s the CEO of a company called Stackery. He did a talk called Self-healing Serverless Applications at Gluecon.

I’ve never seen such a wild reaction in a developer conference where I think every single person in the room put their phone up and took a picture of this live, it was ridiculous, he was rock star for a moment.

What he did was, he said, “Well, this is what Lambda advertised itself as in the beginning. When we first learned about Lambda, here was what was on AWS’s page about Lambda.” It said, “AWS Lambda invokes your code only when needed and automatically scales to support the rate of incoming requests without requiring you to configure anything. There is no limit to the number of requests your code can handle.”

That sounds like magic and if somebody can provide that to me, I think we can all just turn off our DevOps teams and just do that.

But when he actually started using Lambda, he decided on a few edits and he did these on a slide of his edits as though an editor had edited it with a red pen and he changed the slide to read, “AWS Lambda invokes your code sometimes when needed and can scale to support certain rates of incoming requests, but requires you to properly configure everything. There are limits to the number of requests your architecture can handle.”

Everybody just loved that. It’s true, it’s absolutely true. When you start to use Lambda I think that’s what comes out, it’s that it’s not magic, it’s computers doing what they’re told.

Rich: Hey, this is Rich. You might recognize me as the guy who introduces the show, but is pretty much silent during the meat of the podcast.

The truth is, these topics are oftentimes incredibly complex and I’m just too inexperienced to provide much value.

What you might not know is that Jon and Chris created a training product to help develop these skill sets, get caught up to speed on AWS and Docker.

If you’re like me and you feel underwater in these conversations, head over to prodockertraining.com and get on the mailing list for the inaugural course. Okay, let’s dive back in.

Jon: I think where I want to go next with this conversation is, maybe if you could give us, Chris your take on when does your mind say, “Let’s do this with serverless.” Let’s maybe talk more about Lambda although I am interested in hearing more about Fargate at some point, but let’s kind of stick with Lambda for now. When does your mind say, “Lambda is the solution, Lambda is the right answer for this.”

Chris: I think for me the one magic where there’s an event, so when there’s some of this in my system that is reacting to an event, then that may very well be a good candidate to say like this should be hosted on Lambda.

I think one of the advantages of Lambda is when you need to get something done that is pretty straightforward, and simple, pretty isolated, pretty contained, and the overhead of creating the scaffolding to wrap it in a service kind of feels like overkill and a lot of work for it.

I think the way I look at Lambda is that it’s kind of providing you that boilerplate framework foundation–the scaffolding that you need in order to run code in the cloud.

If you’ve got something that’s pretty targeted and especially if you’re leveraging other things in AWS that are needing events and you want to act on them, do transformations, or trigger other things, then Lambda is high at the top of the list, something to look at.

Jon: In my mind, I have a hard time really knowing what the difference is between–and I want to go right into it, what’s the difference between an event and an API request?

Chris: For some people there’s not much of a difference, but for me, I do see a delineation. I would say typically like when I say event, I’m kind of thinking more along the lines of events that are generated by other services that are part of my architecture.

They’re things that don’t necessarily need real time responses. I think of them as more asynchronous as opposed to synchronous. You absolutely can go build an API server composed entirely of Lambda functions that are fronted by API gateway and you can think of those incoming request as events.

For me, I’m definitely more of a fan of—I’m thinking of this as more like, I need to take actions based on what’s happening in the state of my system with the other components and so have them in send events that then trigger these things that don’t need to be done synchronously, they can be done asynchronously.

Jon: Right, I think we can kind of get to why API requests and using Lambda for those is maybe not the right solution and another way, too which is that, Nate’s talk was just so good. I’m looking forward to actually catching up with him and talking through some of these in more detail, but one of the things he did was he laid out some of the common types of errors that come out of Lambda and there are two main types, there’s runtime errors and there’s scaling errors.

The runtime errors are–he laid out three, but that the two that I just want to mention here are uncaught exceptions, your code, barfs and Lambda is like, “Hey, your code barfed.” Or timeouts and timeouts is really significant.

Lambda’s maximum timeout is five minutes, anything longer than that is absolutely guaranteed to timeout, anything shorter than that will timeout if you haven’t manually set the timeout to five minutes.

In a lot of cases you will want to timeout; for an API especially, you’d want to timeout more in the range of five seconds to dozens of seconds at most. Those two types of errors, you have to do stuff about them and Nate suggested, well, obviously with uncaught exceptions, you have to manage those no matter what you’re doing.

It doesn’t matter if you’re in Lambda, or if you’re on ECS, or if you’re on Heroku, you’ve got to make your code resilient against uncaught exceptions, but with timeouts, you largely don’t have to think about that infrastructure kicking you out when you’re running on ECS or Heroku, or platform as a service.

You have to do extra work in order to make your code resilient against timeouts. That’s just a trade off. You get some of the infrastructure for free, but now all of a sudden you have to–Nate’s suggestion, which I thought was brilliant was to sort of wrap your code in sort of pre-timeout indicators, like, “Uh-oh, it looks like the five minutes is coming up.” I’m going to have an internal thing happen in my code that tells me, “I’m stuck, here’s where I’m stuck, I’m about to timeout.” So just so you know, this is what’s about to happen, and then return as opposed to just letting Lambda shut you down.

I thought that was brilliant, but that’s work, there’s no way around that being working complexity that you have to add your code, that you may not have to add to your code if you’re running somewhere else. I thought that was kind of interesting.

The other types of errors that happen in Lambda are around scaling. Lambda has concurrency limits and spun limits, concurrency limits are how many things can run at once, you can have AWS change those and you don’t necessarily know what they are, but you get errors when they happen, and then spun limits are how fast you can scale up, how many new Lambda functions you can call in a short period of time and scale it quickly.

Both of those just return errors to you and here again, your code has to be ready for them, and I think if you’re building a service that can scale and you’re doing it with Docker, and you’re doing it on ECS, you can just plan in advance for that and you don’t have to be surprised by, “Oh, we hit this mysterious scaling limit of AWS and now we have to react to it in a strange way that’s very different than just adding horsepower.”

Sometimes, in terms of brain damage and the amount of effort that it takes, sometimes it’s easier to add horsepower to an ECS cluster than it is to work around some arbitrary and weird scaling limits that you have to discover on your own inside Lambda.

That was a bit of a long explanation of errors and sort of the difficulty that you have to add into your own code to deal with the Lambda errors, but do you think that’s sort of another way of approaching this in making that decision, Chris?

Chris: I think the big takeaway here is that you still have all the problems of running production software in the distributed system regardless of whether or not you run it on Lambda, someone else is running the servers or you’re running on bare metal yourself, you still have to deal with all these stuff. What about code that takes a long time to run, you have to deal with that same problem whether you run it in Lambda, or whether you’re running in ECS.

On ECS, your ELB’s are going to time you out after 60 seconds. If your code is taking longer to respond than that, it’s going to get cut off by the load balance and you’re still spending resources on the backend, that may not timeout, but it’s still happening.

Of course if you’re running up against these timeouts, that’s when you start looking at your code and say, “There’s probably something wrong here.” If I’m actually putting in code to a timer to say, “I know I am about to be killed, so I’m going to throw a timeout so that I can go ahead and log something.” Maybe instead of doing that code, you should be looking at, “Why is this taking five minutes?” This is probably not the right way to do it, right?

This is no silver bullet, serverless is no silver bullet. You still have to be at the top of your game. There’s a lot of complicated issues at play and it solves some problems, but it’s no silver bullet and you still have to deal with these things regardless of where your code is running.

Jon: Just to talk about one thing about the timeout code, about having your functions say, “I’m about to timeout,” and having to do that instead of Lambda. I just wanted to give a little bit more of the reasoning behind that.

When Lambda timeout your function, you don’t get its backtrace, you get nothing. Let’s say you’re scaled up to 12,000 calls per minute of your Lambda function, even if you’ve written just the best code, it’s going to timeout sometimes.

There’s going to be a database connection, or a DNS issue, or something’s going to happen that times you out. For sure, it just always happens, and so that’s why I thought that solution was a little bit brilliant.

Again, it’s a solution that you may not have to do if you’re managing your own infrastructure and you have your own control over timeouts as opposed to letting Lambda just shut you down with no information. Like that point about the load balances being able to do that, too. Someone’s going to turn you off somewhere and not tell you why.

Chris: Absolutely and in a distributed system, there’s multiple actors, and each one of those have their own rules and regulations on that stuff. It gets pretty complicated pretty quickly.

Jon: Right, and then a personal story is recently I was looking at deploying some code, and I was torn between doing it in Lambda and doing it in ECS, and interestingly, because of all the things that we’ve just talked about, about how it’s really–you don’t get everything for free, you don’t get logging management, you don’t get library management, you don’t get deployment, you don’t get all that stuff for free with Lambda, you still have to, for example, what I needed to do was I needed a fairly substantial library for my code to be able to run, and it wasn’t available just as a drop down pick this library type of thing in Lambda.

I would have needed to back it up, and put it in a zip file, and get it over into Lambda, and I was like, “Well if I’m going to do all that, why don’t I just put this in a Docker container and put it on ECS.” That trade off was like a no brainer for me. For me, it was, “I am going to ECS because then, I will have more control over things.”

Maybe the argument for Lambda is that the whole thing might have been a little bit cheaper because you only get charged when the function gets to run, but we weren’t talking about a break the bank amount of money as it was. Running it on ECS was totally affordable as well, and then it gave me more control and more access to what’s happening.

We’ve basically talked about this for 20 minutes and the takeaway is, use Lambda for events and go ahead and use it for applications if you want, but don’t expect it to do anything magic for you. I think that’s probably what we’ll do. We’ll continue to use it for events.

Do you have any other additional things just on serverless, or on this conversation that you think will help put a different light on it, or add to what we’ve already said Chris?

Chris: One of the things that comes to mind that continue–blown away by the cyclic nature of technology, you think of like Lambda almost going back to like mainframes and punch cards, like “Here’s my punch card, my Lambda function, I’m going to give it to you, go run it, and I get back the results.” The same thing with a client server, this kind of feels like batch computing in a way. It’s just a kind of an interesting way of thinking about it and just what’s old is new, and what’s new is old.

Jon: There is something about it at the beginning that I wanted to talk about just for another minute.

I talked about, “Well, what is going to happen if the masses that may not be thinking as completely as it would be helpful and how distributed system should work all pile on and write applications on Lambda, and put them into production, and try and maintain them, and there’s just this influx of Lambda usage? What do we expect to see?”

Chris: Growing business for us.

Jon: That can be a part of it. I guess, some optimistic part of me hopes that some of this stuff is stuff that you can just do for people and that there may be a future where it’s like, “Yeah, just think about your business logic. Just worry about your business rules.” I don’t think we’re anywhere close to that yet.

Chris: The ecosystem continues to evolve and to provide support around those areas. I mean it’s happened with Lambda to start off with, it was very bare bones, and now there’s frameworks around it as well to give you a lot of that support to make things easier.

It’s like the trade off of going with a highly opinionated solutions versus the more flexible things, so the more opinionated you go with it, the more support that you’re going to get, the more stuff you’re going to get out of the box versus doing it on your own. I think we’ll see that same evolution happen in the serverless world.

Jon: Great, I think that’s a good way to wrap it up. Thank you so much Chris, and Rich, we’ll talk to you next week.

Chris: Thanks guys. Bye.

Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online.

This episode along with show notes and other valuable resources is available at mobycast.fm/13. If you have any questions, or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

13. The Future of Serverless