May 1, 2019

58. Practical Issues with RDS Replicas

Show Notes
Transcription
Discussion

Jon Christensen and Chris Hickman of Kelsus and Rich Staats of Secret Stache discuss practical issues with relational database service (RDS) replicas.

Some of the highlights of the show include:

Two primary scaling options for distributed systems: Up or Out
Scaling up is simple and straightforward, but bigger and more capable machines are expensive; scaling out uses cluster of nodes as commodity hardware to handle traffic
Replicating State: Cluster built from set of different machines can access same data and answer same requests
Potential Problems: What happens when one node goes down? Who’s the authority?
Strongly consistent (exact same state for every node) vs. eventually consistent (set of different machines trying to attain same state)
Configure RDS by enabling Multi-AZ or read replicas, but beware of possible ramifications to architectures and applications

Links and Resources

Amazon Relational Database Service (RDS)

Microsoft Azure SQL Database

High Availability (Multi-AZ) for Amazon RDS

Rich: In episode 58 of Mobycast, Jon and Chris discuss the practical issues with RDS replicas. Welcome to Mobycast, a weekly conversation about cloud-native development, AWS, and building distributed systems. Let’s jump right in.

Jon: Welcome, Chris and Rich. It’s another episode of Mobycast.

Chris: Hey.

Rich: Hey, guys. It’s good to be back.

Jon: What have you been up to, Rich? We missed you last week.

Rich: Yeah, I think it might have been a couple weeks, actually. I’m not even sure the last time I’ve been on.

Jon: You still have been doing your work as a producer. Everyone listening wouldn’t have heard the last few weeks if […], so thank you.

Rich: My pleasure. Last week, my dad was in town. I spent the majority of my time just hanging out with him out here in Denver. This week has been, at least today—my dad left yesterday—has been sort of getting back on track and trying to put the pieces together.

Jon: Right on. How about you, Chris?

Chris: I’m kind of digesting some little bit of sad news yesterday. I saw the announcement that the startup that was up before Kelsus, Arivale abruptly just turned off the lights and shut down. Arivale is a scientific wellness company. The mission behind it was instead of our conventional medical system is all focused on treating symptoms and disease, we should be focusing on having just the best health possible. Be more preventative, be more proactive, really just have the most vitality that you can, do that through science and plain things like genetic testing, bloodwork, micro bio and just lots of science around it, behavior change in coaching and what not.

They were founded in 2015 and raised a bunch of money. Lee Hood was one of the founders. He’s just a legend in the medical biotechnology space. It was sad to see yesterday that they said, “Look, business model doesn’t work. We’re just shutting it down.”

Jon: That was really too bad. I even used a couple of their services or I tried them out. If nothing else, I could tell that everybody there was really committed, really hard-working, and wanted to see this succeed. I’m bummed to hear that, too.

Chris: There’s definitely a lot of passion in there. You need the passion to stick. The company had just a really hard time getting the buy-in from the public. That’s a comment on society in general, being proactive, doing the hard work without really seeing the immediate gain is not something that we do. Instead, we just want that pill that fixes us. That’s one of the reasons why they weren’t able to make it because they just really weren’t able to find that population of customers that valued, that would do the hard work before they are faced with the diagnosis.

Jon: Right. You’ve known me long enough that one of my favorite metaphors when I talk to new startups is whether are you selling vitamins or are you selling ibuprofen? I would add to that now, or also are you selling amphetamine? The most wildly successful startups are selling amphetamine and the ones that just struggles are selling vitamins. That’s kind of what Arivale was, is it? They’re like, “Take your vitamins.”

Chris: Absolutely. Hopefully, someday.

Jon: Today, we’re going to talk a little bit about databases, scaling, RDS in particular. I’m excited for this because this comes straight from some really recent Kelsus experience dealing with some difficulties. Let’s see. Maybe you can get us started, Chris, and just help us dive in.

Chris: Yes. Maybe just to set the tone a little bit. Talking a little bit of gist about distributed systems 101. There’s really two primary ways for scaling. You can either scale up or you can scale out. Scaling up is, from an architecture’s design standpoint, definitely the simplest. Scale up just means, as my load increases just buy a bigger piece of hardware, a bigger machine, a more capable machine. I want to double my throughput, I’ll get a twice as fast CPU or double the memory. Just continue getting bigger machine. My software does not have to change. Maybe I have to change my network bandwidth, or I could upgrade my disk space or whatnot. I don’t have to change my code. It’s pretty simple, it’s pretty straightforward. I can do that.

Obviously, there’s a limitation to that. You can only get so big of a machine. There are also getting more and more expensive. In order to deal on internet scale, you really have to scale out. Scaling out is all about, now instead of having a very expensive one, you have a cluster of nodes. These are more commodity hardware. You get the power and the scalability by just having multiple nodes all being able to answer the traffic.

Much more scalable is basically how a lot of systems do scale but it’s also a lot more complicated. Now, you have things like you’re replicating state and how do you deal with what happens if one of those nodes have to sink or goes down? How do you know who’s the authority in there? Lots of complexities, lots of algorithms or whatnot.

Jon: Just to give a concrete example. You just used a technical term about replicating state. I guess a concrete example would be, if you got a cluster of 100 machines and they’re all answering questions from your web session. Maybe your web sessions, you’re in the middle of buying a plane ticket. There are many parts to that. You’re getting seats, you’re adding luggage. Unless you do something like pin every one of your requests to a single machine that knows exactly what you’re doing, what you’re in the middle of—not a good idea anymore; it is the way things used to work for a while—then all the machines need to be able to know where you are in that process. That’s what you mean by replicating state.

Chris: Yes, absolutely. You build these cluster that, from the outside it looks like one thing but behind it, it’s a set of different machines. They all have to be able to answer those same requests that come in. They all need to have access to the same data.

Jon: They’re not all having you pick your seat again.

Chris: Exactly, yes. Maybe some terms to talk about here are strongly consistent versus eventually consistent. When you’re dealing with things like state and replication, you have these multiple nodes in the system. To be strongly consistent means that every single one of those nodes has the exact same set of state. They’re all very strongly consistent. Any one of those nodes at any point in time really has the exact same state as anyone else. That, from an application standpoint ends up being very beneficial, but from a practicality standpoint, it’s very expensive. This is something that’s not really done on a large scale. It’s just something you really have to think through and decide whether or not this is a requirement.

The flipside of that is something called eventually consistent. This means, you have that set of different machines. They’re all trying to have the same state but at any point in time, there’s no guarantee that they do all have the same state. Let’s say, t-zero, one of them could have the latest up to date information and then if there’s two other ones, maybe they don’t have yet. At some point, call it t-one, later, they will eventually have that information. On some later point in time, they will all be in state as of that last. Of course, new changes could have come in. There’s always this lag, this delta, between them.

Jon: I’m going to jump in again; example man is back. Something that might have to be strongly consistent, I’m just thinking, when would just need stronger consistency? One of my thoughts was like, you’re selling tickets to an event and tickets have assigned seats. If you’re saying that row of K, seat three, is sold, then everybody needs to know immediately. There can’t be a case where some nodes think it’s available, some nodes think it sold. That’s just going to break.

On the other hand, social media app, so many updates per status, “Hey, I’m having fun in this cool restaurant in New York City.” Some nodes might know about that right away. Other ones will know about that eventually and nobody really cares. That takes a couple seconds of that to get propagated to the network.

Chris: Absolutely. These are good terms for us to understand, have those examples, and keep them back. These will play into us as we go through. Now, actually start talking about databases with AWS, in particular, RDS, and the various different configurations that you have there. The fact that it’s not just as simple as a checkbox to deal with this.

Jon: Right. Some of the things we left out in the initial conversation, but it was in there, it was under the covers, I just want to bring it up a little bit is that we talked about clusters and nodes. We really didn’t talk about the fact that there may be clusters of applications servers that just have application code that knows business logic and is kind of stateless. I had mentioned in the past those application servers might not have been stateless. They might have had in-memory state that they’re keeping track of. These days are typically stateless. Most architects would not create stateful apps, if at all possible.

That’s one cluster. That’s not really what we’re talking about today. We’re talking about the cluster that might represent your database where you’re actually storing your state. That might be SQL databases, Monggo databases, whatever, just where is it they’re actually storing your state, inside a single node or multiple nodes, cluster, and is the data in there, when you ask it a question, we ask that cluster a question, or tell it to store some information for you, is that action strongly consistent or eventually consistent? Just to separate the two, the data layer versus the application layer.

Chris: Yeah, that’s a good point. We’re absolutely talking about databases here. We’re talking about the state. Like you’ve said, for others, other things that the application later definitely want to make those stateless. That way you can throw a load balancer in front of them and don’t have to worry about anything. It’s not as simple with data. You can’t just put your RDS. You can’t pose risk in front of a load balancer. It’s not going to work too well.

Moving on to RDS. RDS is the Relational Database Service offer by AWS. Many other popular RDS systems such as PostgreSQL, MySQL, SQL Server, Oracle, and whatnot are offered. With RDS, there’s a couple of options here. It’s interesting because both of the strongly consistent model as well as the eventually consistent model applies here.

There’s two main ways to configure RDS from distant architecture standpoint. One is to enable Multi-AZ. What Multi-AZ is, think of it as active and backup or primary and backup. This is really all about dealing with availability. It doesn’t have anything to do with scaling because with Multi-AZ, only one of those databases is actually available for serving requests. You have that backup is being kept in sync with the primary.

In case the primary is no longer available, in which case there’s a failover, in this particular case, you do have replication state and it is strongly consistent. This is a guarantee of the system. When writes come in to that primary, they’re in real-time synchronously being replicated to that backup so that there’s the guarantee that the backup always has the same state as the primary. You need that for a failover. It means if it wasn’t, then when you did fail over, you would see basically data loss which would be a big problem.

That’s Multi-AZ option with RDS. Again, you’re replicating state in a strongly consistent way. You’re paying performance penalty for because your writes are a little bit longer because it does have to wait for acknowledgment that the write to the backup happened.

The other main way of comparing these RDS would be to enable read replicas. With read replicas, you are creating multiple servers for your database, it is replicating the state, but this replication is happening asynchronously. You’re not guaranteeing that when that write happens, that it gets propagated to all your read replicas at the same time. Instead, it’s being done in the background and there’s some replication lag. Basically, it’s adhering to that eventually consistent model.

What this is good for is you can actually now use those read replicas to serve up your read operations. Read replicas is definitely something you use to increase the scalability of the system especially if you’re a read-heavy application. Read replicas aren’t going to help you if most of your applications are writes. Fortunately, most applications out there, they’re much more read-intensive than writes. You can enable read replicas in order to increase the scalability of your applications.

Jon: Right. So, you’re a startup and you’ve got an application out there, it’s got pretty low traffic most of the time, and it’s just been living on a single… You’ve got Multi-AZ configured just because availability is important to you. It’s been living on just one medium-sized instance on RDS. All of a sudden, you’ve got a viral thing happening, you’ve got thousands of people signing on, you’re looking at your database load, and you’re like, “Oh my God. This database is not going to be able to handle this. We don’t want to turn users away or look like we’re having a bad time,” so, let’s do it. Let’s just flip on read replica, right?

Chris: Absolutely. This is like AWS 101. You’ll just do a Google search or you look at the documentation. It’s like, “Hey, if you need more scalability with your read operations to RDS, go use replicas. Enable read replicas.” It’s something in the console, super easy. It’s like a chat box. It’s really easy to enable read replicas for a database.

It’s interesting that it’s glossed over. That’s not all the work. There are actually big ramifications on your architecture and on your application in order to consume this. You do have to roll up your sleeves and do some heavy lifting here in order to take advantage on this new architecture.

Definitely, it’s pretty sophisticated. You’re going from a single node to a multi-node stateful environment. There’s that replication going on. How do you now load balance your request across these things? These are all things you have to consider now once you made that choice.

Jon: Right, Chris. I just have to say something really quick. Before I do, I just want any potential clients of Kelsus to just go ahead and turn off the podcast right now. Turn it off.

Okay, now that they’re gone, I will say that there was a team at Kelsus that recently made this mistake and they just flipped on the checkbox. Guess what happened? Maybe Chris can tell us.

Chris: I’m always blown away by the fact that from the documentation, everything that you see, people, solution architects will just say, “Just enable read replicas and away you go.” It’s not that simple. Some things that now happened is that your read replicas actually have different endpoints. They’re basically different DNS entries than your primary. If you don’t change anything in your code, it’s still going to be going against your primary even though you enabled your read replicas. So now, you have to change your code.

Jon: Right and they did. They just made a quick hot patch, pointed the read operation that the read replica done, right? It works.

Chris: Absolutely. If you’ve somehow, magically, made it so that your application is only doing read operations on those, then that’s fine. If you sent a write operation to a replica, that’s not going to go so well. The replica will fail it. You will get denied. It’s a replica. It’s a read replica. It’s a read only.

Jon: That’s exactly what happened to us. This may be a language thing. It could be we were using this particular library in JavaScript called Sequelize. Maybe many of you listening are using that. I guess, apparently, it’s possible to label the queries in Sequelize. I haven’t actually used it myself, but you label them for what they are, “This is an ‘insert.’ This is a ‘writer.’ This is a ‘query.'”

I guess Sequelize doesn’t really care if you mislabel them. It doesn’t do anything about it if you have a single database and you call something a query that actually has an update in it. It doesn’t say, “Hey, you can’t do that,” but the read replica will. The read replica will say, “Hey, you can’t read that if you try to send an update to it.” Sequelize knows which database, whether just to use read replica or use the main database based on how those queries are labeled. It’s really important to get them exactly right.

Chris: Absolutely. It is one of the things where once you do enable read replicas and you have considered your application—again, most libraries out there that they’re using has middleware to go, do data modelling, and talk to a database—they understand the difference between read versus write. It’s just for this exact scenario.

It’s not like they can analyze the exact operation that you’re sending and always know that this should be sent to the reads or this should be sent to writes. There may be some of the basic straightforward syntactic […] operations that knows if you’re going to create or an insert, this should definitely go to the write, to the master, to the primary. But if you have some arbitrary SQL, maybe some complicated SQL you want to go, it’s going to cross multiple tables, doing some joins, maybe it’s doing some other event there.

You have a more hand-rolled SQL statement. It’s not going to know how to label that necessarily. You really do have to think this through and look at this. Once you do make this change to go to your replicas, the onus is on you to really make sure that you’ve done that coverage, that you’ve gone to the code and everywhere you are making these kinds of database operations, have they been identified, flagged accordingly. Otherwise, you’re going to get a really rude awakening when you go to production. Maybe that one operation that is very rarely executed that you forgot that you didn’t catch. It’s going to start failing at midnight, on a Saturday night, and it’s because that actually isn’t a mutable operation, it shouldn’t have gone to the primary, but instead you sent it to the replica and the replica has failed it. It’s going to continue to fail. Your code’s not going to do anything different. You’re basically now just completely failed for that.

Jon: Just to add some color to this, I do feel like this may be a classically startup-type problem. It’s the type of problem that happens when you’re trying to not overbuild things in the beginning, then later you’re trying to add some capacity, and it’s may be a less of a problem when you’re building first scale from that start because you know enough, day one, it’s going to already have tens or hundreds or thousands of users and need a cluster to begin with. Many of us get our experience through these startup type situations. It’s good to know about it.

Chris: It’s not even just startup companies. At the beginning of a project, you don’t need a read replicas necessarily. The traffics not there. Even if you’re in an enterprise or in a big company, you probably still go through this same kind of life cycle. You won’t be doing it from the get-go.

Another big gotcha with replicas is we talked about strongly consistent versus eventually consistent. These read replicas, they are eventually consistent. Normally, your application code when it’s making operations to the database, it’s operating against a strongly consistent data model. You go and write something to the database and then you read it back, you expect it to be there.

When you go to this multi-nodes system with read replicas, that’s not the case anymore. If you write to the primary then immediately try to read that data back from one of the replicas, it may not be there yet. It still has to replicate. This has really severe consequences to your application. You have to be thinking through, where’s my code expecting the strongly consistent model? When that’s the case, I can’t read from the replicas. I have to be going against my primary. The traffic going to the read replicas should be the read operations that can handle the eventually consistent model.

Jon: Right and since it’s just us again, since all the potential clients left, another story that I have to tell you is that Kelsus, after fixing the issue with labeling queries and having read replicas working production for several weeks in a row, we got bit again. It was like, “How did we get bit again?” It turned out that it was exactly this eventually consistency thing. Somebody wrote some code that expected to be able to go through a sequence of five or six database operations that were not a part of a single transaction. In the second operation, they were maybe reading from some stuff that they have inserted into a table in the first operation.

Our staging environment was not the same as our production environment. In staging, everything just worked beautifully. Then, suddenly when we put it into production, the business was just depending on it working absolutely right. It totally broke because of this eventual consistency problems. I think the lesson learned there is not just knowing about eventual consistency and planning for it but also not taking shortcuts with your staging environment even if it’s going to cost you extra to have a full production copy in your staging environment.

Chris: Absolutely. I think the big takeaways here is that things like read replicas having clusters of state is very important for scaling. You’re going to have to do it. Even with all the help that you have out there and these managed systems, it’s not as simple as checking something off in a console and enabling it. You really have to understand what does this mean to my architecture? What needs to change in my software? How do I test this? How do I verify that this is working in the way that I expect it to? It’s not easy work.

Jon: Yeah, I would say that. When you run into a problem that’s caused by eventual inconsistency, it’s going to be a big problem. You just can’t do a quick hot patch like, “Oh my God, the data is really getting messed up. Where exactly is this SQL that’s depending on something that just got inserted?” That’s not a quick one-liner change usually. Planning in advance is going to save you some real headaches with your users.

Chris: Usually, it has a feeling of spooky logic. It’s fuzzy. When you do have this problem, it’s not like you’re failing necessarily. It’s just that the data that you expected to be there isn’t there. That has some ramifications. You’re going to get just the wrong results, that you may not be able to detect that very quickly. It may take some time before you start hearing complaints from end users saying, “Why do you think a lizard is cuter than this kitten? Why did the lizards beat them in the rankings? That doesn’t make any sense.” You know what? The kitten though, there’s still replicating.

Jon: Exactly. I think that’s a wrap. Thank you for talking through such an important part of database architecture.

Chris: Yeah, this was fun. Thanks, guys.

Jon: Thanks, Chris. Thanks, Rich.

Chris: See you.

Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/58. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

58. Practical Issues with RDS Replicas