February 26, 2020

101. Database Soup – Explaining ACID, BASE, CAP – Part 3

Show Notes
Transcription
Discussion

Summary

This week on Mobycast, Jon and Chris conclude their multi-part series on “database soup”, where we make sense of the jumbled acronyms of consistency models.

In this episode, we learn about eventual consistency and the BASE properties. Eventual consistency may sound like a beer guy meme – “I am not always consistent, but when I am, I get there eventually.”. But it’s no joke – eventual consistency is a key technique for scaling systems, and it’s important to know what it is and when to use it.

We finish up by summarizing what we have learned about ACID and BASE and knowing the tradeoffs each makes. Afterwards, you’ll no longer confuse consistency models with the pH scale of your high school chemistry class.

Show Details

In this episode, we cover the following topics:

Recap
- We are in the midst of our multi-part series on “database soup” – consistency models explained in three acts
  - Previously we covered:
    - “Act I: Transaction processing”
      - We learned about transactions and in particular the ACID properties
    - “Act II: The arrival of the Internet creates new challenges”
      - Building large scale-out systems leads to “discovery” of the CAP theorem
  - Today we finish up with “Act III: Eventual consistency saves the web”
Eventual consistency and BASE
- Motivation behind the philosophy
  - How do we build internet-scale databases?
    - Rethink requirements
    - What type(s) of data are we storing?
    - In particular, what’s our consistency model?
      - Strongly consistent vs. eventually consistent
- Properties
  - Basically available
    - System guarantees availability, in terms of the CAP theorem
  - Soft state
    - State of the system may change over time, even without input (due to eventual consistency model)
  - Eventual consistency
    - Consistency model used to achieve high availability
    - System will become consistent over time
      - If no new updates are made to a given data item, eventually all accesses to that item will return the last updated value
    - Liveness vs safety guarantees
      - Purely a liveness guarantee (reads eventually return the same value)
      - No safety guarantees: system can return any value before it converges
    - When system achieves eventual consistency is said to have “converged”
    - Also called “optimistic replication”
- Examples of systems with BASE semantics
  - NoSQL
    - Google BigTable // Google Cloud Datastore
    - Amazon DynamoDB
    - Cassandra
    - Microsoft Cosmos DB
    - MongoDB
Personal story
- Amazon and DynamoDB
  - Werner Vogel’s keynote at re:Invent 2018
    - “My worst day at Amazon was 12/04/2004…”
    - That was the order deadline date for free super-saver shipping orders to be delivered by Christmas
    - OracleDB was used for storing orders, items, and customers
    - OracleDB went down for 12 hours because of a database bug
  - The post-mortem analysis
    - They realized RDBMS is not designed for the Internet/cloud
    - They also noticed that the information stored in the RDBMS had the following characteristics:
      - 70% was single table, single row
      - 20% was single table, multiple rows
      - Only remaining 10% involved multiple tables
        
        In other words, not relational data
  - Follow up actions
    - This led to building a new type of database for the Internet with specific features:
      - Sharding
        
        At the application layer
      - Shared nothing clusters
        
        Cell based architecture
        
        Each cell has its own application and persistence layer
        
        Think region -> AZ -> service cell
      - Shared disk
    - New database became DynamoDB
      - DynamoDB features an eventual consistency model
        
        Allows user to make tradeoffs between availability and performance at a certain cost point
      - Eventual consistency model was made possible by architecture/design choices
        
        Sharding
        
        Shared nothing clusters
- Microsoft/Viathan and Leviathan
  - What a second… this story seems familiar…
    - Microsoft circa 1997-1998
      - Building first wave of large scale internet applications for Microsoft Network (MSN)
      - Had same realization that Amazon did
        
        Data being stored not relational, we called it “Internet data”
        
        Very, very similar to “document” database model
      - We just had this realization at least 6 years before Amazon did
      - Work on “Internet File Store” (IFS)
        
        “Extensible storage system” (patent filed March 11, 1999)
    - Viathan circa 1999-2001
      - Built Leviathan database system and Venus virtual file system
      - Both systems built on
        
        Sharding
        
        Shared nothing clusters
        
        Shared disk
  - To go deeper, go listen to Mobycast episodes 39 – 43
Putting it all together
- ACID systems choose consistency over availability
- BASE systems choose availability over consistency
  - Necessary in order to scale
- BUT stay tuned… we are now seeing ACID-compliant systems at internet scale
  - Aurora, Cosmos, YugabyteDB

End Song

Whisper In A Dream (Feathericci Remix) by Uskmatu

More Info

For a full transcription of this episode, please visit the episode webpage.

We’d love to hear from you! You can reach us at:

Web: https://mobycast.fm
Voicemail: 844-818-0993
Email: ask@mobycast.fm
Twitter: https://twitter.com/hashtag/mobycast
Reddit: https://reddit.com/r/mobycast

Coming soon…

Summary

Show Details

Links

End Song

More Info