101. Database Soup – Explaining ACID, BASE, CAP – Part 3
Summary
This week on Mobycast, Jon and Chris conclude their multi-part series on “database soup”, where we make sense of the jumbled acronyms of consistency models.
In this episode, we learn about eventual consistency and the BASE properties. Eventual consistency may sound like a beer guy meme – “I am not always consistent, but when I am, I get there eventually.”. But it’s no joke – eventual consistency is a key technique for scaling systems, and it’s important to know what it is and when to use it.
We finish up by summarizing what we have learned about ACID and BASE and knowing the tradeoffs each makes. Afterwards, you’ll no longer confuse consistency models with the pH scale of your high school chemistry class.
Show Details
In this episode, we cover the following topics:
- Recap
- We are in the midst of our multi-part series on “database soup” – consistency models explained in three acts
- Previously we covered:
- “Act I: Transaction processing”
- We learned about transactions and in particular the ACID properties
- “Act II: The arrival of the Internet creates new challenges”
- Building large scale-out systems leads to “discovery” of the CAP theorem
- “Act I: Transaction processing”
- Today we finish up with “Act III: Eventual consistency saves the web”
- Previously we covered:
- We are in the midst of our multi-part series on “database soup” – consistency models explained in three acts
- Eventual consistency and BASE
- Motivation behind the philosophy
- How do we build internet-scale databases?
- Rethink requirements
- What type(s) of data are we storing?
- In particular, what’s our consistency model?
- Strongly consistent vs. eventually consistent
- How do we build internet-scale databases?
- Properties
- Basically available
- System guarantees availability, in terms of the CAP theorem
- Soft state
- State of the system may change over time, even without input (due to eventual consistency model)
- Eventual consistency
- Consistency model used to achieve high availability
- System will become consistent over time
- If no new updates are made to a given data item, eventually all accesses to that item will return the last updated value
- Liveness vs safety guarantees
- Purely a liveness guarantee (reads eventually return the same value)
- No safety guarantees: system can return any value before it converges
- When system achieves eventual consistency is said to have “converged”
- Also called “optimistic replication”
- Basically available
- Examples of systems with BASE semantics
- NoSQL
- Google BigTable // Google Cloud Datastore
- Amazon DynamoDB
- Cassandra
- Microsoft Cosmos DB
- MongoDB
- NoSQL
- Motivation behind the philosophy
- Personal story
- Amazon and DynamoDB
- Werner Vogel’s keynote at re:Invent 2018
- “My worst day at Amazon was 12/04/2004…”
- That was the order deadline date for free super-saver shipping orders to be delivered by Christmas
- OracleDB was used for storing orders, items, and customers
- OracleDB went down for 12 hours because of a database bug
- The post-mortem analysis
- They realized RDBMS is not designed for the Internet/cloud
- They also noticed that the information stored in the RDBMS had the following characteristics:
- 70% was single table, single row
- 20% was single table, multiple rows
- Only remaining 10% involved multiple tables
- In other words, not relational data
- Follow up actions
- This led to building a new type of database for the Internet with specific features:
- Sharding
- At the application layer
- Shared nothing clusters
- Cell based architecture
- Each cell has its own application and persistence layer
- Think region -> AZ -> service cell
- Shared disk
- Sharding
- New database became DynamoDB
- DynamoDB features an eventual consistency model
- Allows user to make tradeoffs between availability and performance at a certain cost point
- Eventual consistency model was made possible by architecture/design choices
- Sharding
- Shared nothing clusters
- DynamoDB features an eventual consistency model
- This led to building a new type of database for the Internet with specific features:
- Werner Vogel’s keynote at re:Invent 2018
- Microsoft/Viathan and Leviathan
- What a second… this story seems familiar…
- Microsoft circa 1997-1998
- Building first wave of large scale internet applications for Microsoft Network (MSN)
- Had same realization that Amazon did
- Data being stored not relational, we called it “Internet data”
- Very, very similar to “document” database model
- Data being stored not relational, we called it “Internet data”
- We just had this realization at least 6 years before Amazon did
- Work on “Internet File Store” (IFS)
- “Extensible storage system” (patent filed March 11, 1999)
- Viathan circa 1999-2001
- Built Leviathan database system and Venus virtual file system
- Both systems built on
- Sharding
- Shared nothing clusters
- Shared disk
- Microsoft circa 1997-1998
- To go deeper, go listen to Mobycast episodes 39 – 43
- What a second… this story seems familiar…
- Amazon and DynamoDB
- Putting it all together
- ACID systems choose consistency over availability
- BASE systems choose availability over consistency
- Necessary in order to scale
- BUT stay tuned… we are now seeing ACID-compliant systems at internet scale
- Aurora, Cosmos, YugabyteDB
Links
End Song
Whisper In A Dream (Feathericci Remix) by Uskmatu
More Info
For a full transcription of this episode, please visit the episode webpage.
We’d love to hear from you! You can reach us at:
- Web: https://mobycast.fm
- Voicemail: 844-818-0993
- Email: ask@mobycast.fm
- Twitter: https://twitter.com/hashtag/mobycast
- Reddit: https://reddit.com/r/mobycast
Coming soon…