46. Revisiting the Serverless Holy War
Jon Christensen and Chris Hickman of Kelsus and Rich Staats of Secret Stache revisit the “Holy War” of serverless API development and whether it’s worthwhile for projects.
Some of the highlights of the show include:
- Serverless Definition: Functions as a service, or code running on infrastructure managed by someone/something else
- Progress and popularity of Lambda-based architecture; serverless isn’t only Lambda, understand pros and cons of other tools to determine if Lambda is the best option
- AWS offers products and services that let developers assemble and build things without needing deep knowledge
- Challenges of using only AWS services for a project; getting all the pieces to work together can be complicated and time consuming
- Why are you doing serverless? Benefits and reasons for going serverless or not
Links and Resources
Rich: In episode 46 of Mobycast, we revisit the holy war of serverless API development, whether or not it’s a good idea for real-load projects. Welcome to Mobycast, a weekly conversation about containerization, Docker, and modern software deployment. Let’s jump right in.
Jon: Welcome, Chris and Rich. It’s another episode of Mobycast. Back in June, we did an episode of Mobycast where we talked about serverless. It’s a big topic. Actually, last week, we talked about serverless, too. We talked about Lambda and some of the new things that Lambda offers with layers and the runtime API. We always have this thing that is ongoing where it feels like I’m a proponent of server list architectures and, “Hey, we should do this. We should try this and prove to ourselves that it is doable, and maybe we’ll like it,” and, “Stop being such an old fuddy-duddy and putting your head in the sand and not realizing that progress means Lambda-based architecture.”
We thought in this episode that we’d revisit that argument where I’m right, and Chris is right, and we’re going to find out who’s really right. Should we be doing serverless-based architecture? It’s been a while and AWS moves slightly faster than the speed of light so, yeah, maybe it is time now. I just hate the fact that you cannot have a conversation about serverless without defining it, but it is a fact. Let’s define it. Chris, can you tell us for the purposes of this conversation how we’re defining serverless?
Chris: Let me see if I can come up with it. There’s two broad ways of looking at this. I think, in general, most people, when you say serverless, they’re thinking Lambda. Everything functions as a service. Really, we keep hearing the term “serverless”. It’s just everywhere. We’re inundated it from cloud folks, from techno-communities, from marketing folks, and when you hear that term, that’s not functions as the service; that is really just the idea that your code is running somewhere and you’re not managing the infrastructure. It’s anything where you don’t have to spin up a machine. If you don’t own that machine, if you’re not managing that machine, then it’s serverless.
This ends up now broadening the definition so you’re no longer just Lambda; you’re not also things like DynamoDB or you’re things like S3 or SQS. All of these are managed services. You don’t have to spin up any resources, whatsoever; you just consume them and use them, and away you go. I think that when we talk about serverless, it’s not just Lambda because there are problems with just using Lambda, like saying, “I’m just going to go all in. Everything is now a function. Whatever app I’m doing, whatever I’m writing in code, it’s going to be Lambda. I’m serverless. Serverless is great.” That’s where the hairs on the back of my neck stick up, and that’s where I start really getting uncomfortable and saying, “Wait a minute. Slow down.”
Jon: For the purposes of this conversation, we are approaching that Lambda-based architecture. Basically, it is a really popular thing now to say, “I’m going to put some doors through my backend. It’s all going to be via API gateway and Lambda,” and then, behind there, there might be DynamoDB, there might be Cognito, and other managed services so that you have an entirely serverless architecture and you don’t have any Docker images that you’re spinning up on any kind of orchestrator or anything.
Those architectures are super popular right now. A lot of blue sky projects are getting started with those architectures. AWS appears to want us to use that architecture. Wouldn’t you agree? They seem to want people to do that. They’re kind of selling it.
Chris: We’ve talked about this before. AWS is customer fanatical so they’re going where customers are asking them to go, and so there are a lot of folks out there that for them to understand what it means to run a cluster in ECS is just not something that they want to go do so they run it on this—SOS is just writing some functions, and stitch some stuff together, and really almost build applications at the solution architect level and not have to dive down into more deeper development states.
It’s almost like you’re just assembling, like it’s almost like an IKEA project of putting together a bookshelf as opposed to going out into the shop and starting off with just raw lumber. I think that the market is pretty big for that. There are folks out there that don’t have that deep knowledge to go build those kinds of things. I would just say AWS is certainly not discouraging Lambda.
Jon: They’re definitely adding features for people that do that. Whether they’re saying, “Hey, everybody should do this,” they’re at least not discouraging it.
Chris: There’s absolutely a place for this, too. There are certain things that you should add. Using Lambda makes all the sense in the world, and you should totally do that. There’s other things where using Lambda doesn’t really feel right and you really shouldn’t be doing that. It’s to understand what is it good for, what are the pros, what are the cons, what are the things to get easier, what are the things to get difficult, and then just making the right decision and not going by what the marketing folks are saying, and just saying, “You’ve got to go serverless. It’s the new container.”
Jon: Controversial statement here: Marketing folks include developer advocates.
Chris: Yeah, absolutely.
Jon: At the end of the conversation we had in June, we basically said that, at least in terms of CalSys or the types of projects that we do, which does include a lot of blue sky work and it includes a lot of work for small to medium-size workloads where we’re talking in the thousands of uses a day but typically not in a million, that we weren’t really to frontend our microservices with Lambda and API Gateway. We still wanted to use things like Express, put in Docker images, and put those onto an orchestrator, but we did want to use serverless for event-driven architectures where maybe SQS and SNS are involved and something changes and a lambda function says, “Hey, make those change in the database or go take off this little process.”
That’s where we ended up in June. I sort of felt a little dissatisfied because I thought that, as a company, there’s this movement happening and we’re not really jumping on and we’re not experts at it and we could be left behind. I just didn’t feel like I knew enough from our conversation to really agree that this wasn’t going to be the way we’re architecting applications in the future. In December, I said, all right, I’m going to go figure this out myself.
Actually, while I was at re:Invent, I just had this pulse of an idea and I was like, I can build this so fast. I can IT together an application with all these AWS components, and it’s going to be useful for our application. I’ll open-source and it’s going to be awesome, and I’m never going to touch a server. I’m never going to touch infrastructure, and then Chris is going to look at what I’d built and be like, “Oh my god, I was wrong,” so I started then I started building it.
What this application does is it’s also a little pain problem that we have at CalSys. We run a lot of applications, and we just need to know how those applications need to be started and stopped and, if certain common problems happen, what you should do to troubleshoot them. We have to write down those answers, and those are called run-books at least in our vernacular. They may be called other things in other organizations. We have a bunch of run-books and they’re all out of date because nobody likes writing run-books. They’d rather write code.
I thought, a lot of times, the best way of doing a run-book would be a quick screencast of yourself. You can fire up QuickTime on your Mac and say, I’m doing a screen recording, and then you can talk during it so you can say, here, I clicked in this and that’s going to show me that. Your voice is recorded for posterity, and people can watch that and know exactly how to start and stop a service or whatever you need to do.
I was like, why don’t we use those for run-books instead of having to write? It may take 15 minutes and you’re done. I’m thinking two hours especially because, for a lot of our developers, English is their second language and we don’t really want to have run-books in Spanish because then the owners of the company might have a little bit of a harder time reading them. This was going to be the solution. I thought, if you have a pile of screencast, how do you know which one you need to look at?
It sure would be nice if you could search those, and then all this is when it occurred to me: AWS has transcribe service. We can transcribe the screencast. We can then make those transcriptions full text searchable and then, boom, we have a Google-like service of all of our run-books so you could just hop in there and look for ECS restart, and all of the run-books that include the word “ECS restart” are shown on your screen. You can pick the one you want to watch. That’s a weekend project.
Jon: A few hours.
Chris: I started building it, and my goal was to not use anything that’s not AWS unless I had to. That means for our repo, I was using CodeCommit. For my CI/CD pipeline, I’m using CodeBuild, CodeDeploy and CodePipeline. For my IDE, literally, I’m using Cloud9 instead of using VI or Visual Studio. I’m using Lambda and API Gateway so I’m not using ECS, but that’s because I’m trying to do a serverless thing. I’m using DynamoDB for our data storage.
I’m using elastic Transcribe for transcription. I’m using S3, obviously, because you never do a project without S3. I’m using Cognito for user storage stuff and using one other one. It’s a freaking Lego kit of AWS services that I’m pouring into this thing, and I started out and started building it. Before I tell you what I ran up against, what do you expect I might have run up against, Chris? I’m just curious. You’re hearing this and you’re like, “I know what that would be like.” What do you think of it as like?
Jon: I think, actually, this particular project, the way that you describe it, this feels like it’s a pretty good fit for Lambda and for the text act that you’re talking about. At the end of the day, this is probably a bit more of an event-driven system so you can think of a screencast gets created, maybe gets dropped on an S3 bucket that triggers an event which then kicks off a way of inserting it into the catalogue, and then it goes and does the transcription and background and puts that into some knowledge base or something like that.
As you mentioned, this is a bit more than one of those IKEA-type applications where you’re using all these great building blocks to assemble them together in the right way, to stitch together Step #1, Step #2 and Step #3 to build something. From that standpoint, it sounds like this is a pretty good fit for that. You’re not trying to implement a RESTful API service. It doesn’t sound like there’s a lot of complicated business logic code and other dependent libraries and whatnot.
That said, I’m sure that there are plenty of issues with just going and doing the whole fact that there’s a lot of ramp-up here. This is an alphabet soup of a bunch of different services and getting them all to work together correctly so, in theory, this is one of those canonical examples for what serverless and what AWS technologies are good for, but, in practice, actually getting all that right is not so easy, and it’s not straightforward, and there’s a lot of gotcha. There’s things like debugging and troubleshooting totally different in this world, and you’re going to be banging your heads against the walls for a bit, I would imagine. That’s my impression.
Jon: Yeah, and that’s pretty close. I would say that you’re absolutely right for the piece on transcription and maybe you put a screencast into an S3 bucket and then that triggers an event that calls a lambda function but then does something and transcribes and then maybe makes that searchable somehow. That piece is very event-driven. I think where I made a mistake is that we could have just said, “Hey, everybody. Here’s a developer. You can just go into S3 and look at this pile of stuff and you can figure it out yourself,” but, instead, I was like, well, I’m going to build a web application to frontend all this then you’re going to be able to sign up for it, log into it, look at a list of run-books, add a run-book, delete a run-book.
Now, all of a sudden, we’ve got a run-book API that’s a CRUD REST API, and I built that in Lambda as well. It’s a serverless with lambda stuff in front of it. I think that’s where I went down this rabbit hole of pain, actually. I’ll just come out and say this: It was painful. I’ve experienced doing that exact thing in Java and then in Ruby on Rails, then in Sinatra with Ruby, then with Python, so I’ve done this exact project many times myself where I’m building a very simple API to the backend, a web application, and this is the hardest it’s ever been.
That’s really important. I need to put an exclamation point on that. It is hard. It took me a long time. I remember in 2006, going to some little thing where some guy was like, “I’m going to show off this new thing called Ruby on Rails,” and then he worked on this little VI terminal for 15 minutes and he’s like, “Now, I have a running API,” and that was it. Maybe it wasn’t quite 15 minutes, and that’s the point I’m trying to make.
I spent hours, and hours, and hours getting just a simple CRUD API together. It’s not like I was configuring and hooking up everything inside the console. I was using this thing called the Serverless Framework, which is available at serverless.com, and it handles all of the heavy-lifting for you. It creates cloud formation templates for you so all you have to do is put in some configuration to say what you want the names of your functions to be, and whether you want them to support cores, and whether you want them to do some things like if you want specialized headers to be returned.
It just has a bunch of YAML configuration that you have to do, and then, once you do all that YAML configuration, it will go and create cloud formation templates, and then you can type a “command: serverless deploy”, and serverless deploy will connect to your AWS account and run those cloud formations and stack them up for you. It turns out that a single CRUD API that connects to a Dynamo database and also uses Cognito produces probably 1500 lines of cloud formation. That’s a lot. Every single one of those lines has something that could possibly go wrong that you need to troubleshoot.
Chris: Probably 800 lines where rules and policies being created.
Jon: Exactly. When things went wrong, they went wrong fantastically. You see your cloud formations start to roll back and then some stuff doesn’t roll all the way back, and this is just stuff you don’t have to think about or worry about when you are just building with a framework where everything is self-contained like Express, Django or Rails. Everything is contained and you start the framework process, and the framework process exposing the API, and everything can see everything else within that process so it’s very, very easy to see where things are and, when things go wrong, it’s very clearly reported. There’s not many, many different pieces where, if something goes wrong in one piece, it might manifest itself differently in another piece.
That’s abstract to say, but let me give a specific example. You and I were talking about this yesterday, Chris. You hopped online with me because I needed some help troubleshooting, and we were looking at something where I was trying to just get—my React application was trying to call a list API to just get a list of run-books, and it was getting a 502 error from the service. When we looked at the network stack, there was references to CloudFront in there. Is there something wrong with my CloudFront distribution?
I was also able to look at my lambda function and see that my lambda function is getting called so the front door is open, so I’m just like, okay, everything seems to be set up right, but then, while you were on the front with me, I was like, look at that. In the console, there’s the CORS headers. Okay, I just must have an issue with my CORS setup. Then, I go around and I’m looking at my CORS setup and it’s like, gosh, there’s nothing I can see in anything that I’ve done where I’ve left out some CORS configuration, and my lambda function is clearly, clearly, clearly returning CORS headers, absolutely 100% returning CORS headers.
What is the deal? Why is it complaining about CORS? After I got off the phone with you, Chris, I realized, it’s returning CORS but, in the case of an error, the CORS isn’t coming through. The 502 error that that API Gateway was returning via CloudFront, did not have CORS headers. It was the error that didn’t have CORS headers, not my lambda function. My lambda function was returning at 200 and had CORS headers in it but, for some reason, API Gateway was returning a 502 and didn’t have CORS headers in it.
The whole thing was a little bit confused by the fact that my API Gateway was optimized for CloudFront so that means it sends out your lambda function out to the edge to make things a little faster, but it’s confusing because CloudFront has got a lot of switches and a lot of things you can play with in it, and it makes you feel when you see that word “CloudFront” in an error message, it makes you feel like you may have made a mistake there.
Anyway, what I finally figured out was that my lambda function return body, the body was JSON and not returning stringified JSON, which was causing API Gateway to say, “No, that’s not valid. You have to stringify by JSON in the body, and if you don’t stringify your JSON in the body, then you’re considered a bad gateway,” and so that’s where the 502 error comes from. Since the API Gateway was creating a 502 error and it was not adding CORS headers on the 502 error, that’s why, in the console, the whole thing looked like a CORS issue whereas it was really just a syntax issue in the return of lambda function.
That’s just a lot of layers of troubleshooting and a lot of misdirection, and that’s just one of 50 things that I had to troubleshoot that were similar to that. Now that I’ve done it, now that I’ve been through this process, I could do it a lot faster, but not 10 times faster. That’s one thing I told you yesterday because I think I can do it twice as fast but there would still be things that I’d run up against that are just head-scratchers.
Chris: There’s definitely nuances that are still waiting out there for you as you continue working on this project for sure. This just highlights the fact that, again, serverless as a subset, using Lambda functions as a service to go build an application, there’s a lot of technology there. There’s a lot of building blocks you can build very powerful, highly-scalable, very cost-effective applications, but, right out of the gate, you’re a complicated distributed system, and it’s what you were getting at.
With the Ruby on Rails thing, that’s not really distributed; it’s one component. It’s all contained and you don’t have these interfaces and dependencies. It’s really an atomic unit and you can just go, and you might have some backing database but that’s probably about it. It’s pretty simple. It allows you to get going very, very quickly, and it is a simple closed system versus something like what you’ve done now if you draw the architecture diagram.
Think about it. You have at least 10 major services on that diagram and you have multiple communication paths. Because of the way that things work in the cloud, you need to make sure that you have the right rules and policies set up for every one of those bi-directional interfaces between each one of these things. There’s just a lot of I’s to dot and a lot of T’s to cross. This is why it comes to what are the benefits of serverless. Why are you doing serverless?
One of the big value propositions for serverless is it’s very cost-effective. You can do hundreds of thousands or millions of requests very, very cost-effectively. You’re not spending those costs but you do have the development costs, the engineering costs, and so you have to ask yourself, am I really in a position to take advantage of that? Is the extra work that I’m going to put into this to build it going to be offset by those savings or is this just not really even applicable to me and I’m actually just doing this for the sake of doing serverless and really not getting the benefits out of it?
Jon: Do you think that 30 of us will be needing to do tens of thousands of requests per minute with low latency all across the world?
Chris: Not unless we sign up with Amazon as a customer. We’re all ears for that. Most of the stuff that we’ve seen, we’re not anywhere near that level. Honestly, there’s not a lot of folks out there that have that scale, too.
Jon: That’s what’s interesting and maybe that’s part of the draw to it. It’s like, yeah, in about a month, I was able to build something that could scale to that level of usage, and that’s cool. It’s like, whoa, I built a Formula 1 and it only took a month, but it didn’t take a month and if I had a built nice Honda, it might have taken a day. It might have been a weekend project. That goes into the costs. I think we were joking around the other day with this architecture that we may have saved ourselves $500 a month on AWS bills in exchange for $10,000 or more a month in additional development fees.
Chris: Going back to your example of building a Formula 1 car versus a Honda. Just keep in mind whatever car you build, you’re going to be driving it through the city during rush hour. You can only go five miles an hour anyhow so the F1 doesn’t give you anything there. I was laughing. In Downtown Redmond, the traffic has gotten so bad for such a town that’s not Seattle. It’s smaller but it has grown quite a bit with Microsoft, and the traffic’s just terrible. It’s just bumper-to-bumper and it takes you 20 minutes just to go a few miles, and then you see these really nice sports cars, Ferrari and a Lamborghini, and it’s like, “Really?” That’s a really expensive way to go from Point A to Point B very, very slowly.
Jon: I think we have this app that’s been created, and we might want to build on it. We have options. One of the options would be to use Express and let’s put at least a part of the application that’s all it is, is just a backend for a React application, just a little REST API. Let’s stick that into a Docker image and write it on ECS so that it’s easier to maintain. We could do that or we could keep it as a learning bench for CalSys, like here’s a place you can go to really mess around with cloud formation, to really mess around with Lambda and mess around with some of the other managed services that I took advantage of like Cognito et cetera. I think it’s likely that we’ll do that just with full knowledge that those learning experiences will probably benefit our clients down the road.
Chris: Absolutely. Things like DynamoDB are super important and just fabulous technology that we definitely want to use more and more, going forward, and things like Cognito. That’s another important foundational service that we’ll want to look at leveraging more and more in the future, and definitely Lambda. There are definitely very real situations that warrant using Lambda, and it’s very good to use it. We’ll be incorporating more of that as we go forward as well. Yeah, it’s a great learning workbench, if you will, to do this.
Jon: Hopefully, we’ll actually use it to keep track of our index.
Rich: Well, dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources, is available at mobycast.fm/46. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.