The CAP theorem, also named Brewer’s theorem,
The CAP Theorem is a fundamental theorem in distributed systems that states any distributed system can have at most two of the following three properties.
- Consistency
- Every read receives the most recent write or an error.
- Availability
- Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition tolerance
- The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
This has allowed us to handle increased traffic with lower latency, allowed an easier expansion of the database system, provided better fault tolerance in terms of replication, and so much more.
CAP theorem full form
CAP theorem, stands for Consistency, Availability and Partition Tolerance.
In system design, CAP theorem is very important to achieve high throughput with low latency for a large distributed system.
Consistency
A service that is consistent operates fully or not at all. Gilbert and Lynch use the word “atomic” instead of consistent in their proof, which makes more sense technically because, strictly speaking, consistent is the C in ACID as applied to the ideal properties of database transactions and means that data will never be persisted that breaks certain pre-set constraints. But if you consider it a preset constraint of distributed systems that multiple values for the same piece of data are not allowed then I think the leak in the abstraction is plugged (plus, if Brewer had used the word atomic, it would be called the AAP theorem and we’d all be in hospital every time we tried to pronounce it).
In the book buying example you can add the book to your basket, or fail. Purchase it, or not. You can’t half-add or half-purchase a book. There’s one copy in stock and only one person will get it the next day. If both customers can continue through the order process to the end (i.e. make payment) the lack of consistency between what’s in stock and what’s in the system will cause an issue. Maybe not a huge issue in this case – someone’s either going to be bored on vacation or spilling soup – but scale this up to thousands of inconsistencies and give them a monetary value (e.g. trades on a financial exchange where there’s an inconsistency between what you think you’ve bought or sold and what the exchange record states) and it’s a huge issue.
We might solve consistency by utilising a database. At the correct moment in the book order process the number of War and Peace books-in-stock is decremented by one. When the other customer reaches this point, the cupboard is bare and the order process will alert them to this without continuing to payment. The first operates fully, the second not at all.
Databases are great at this because they focus on ACID properties and give us Consistency by also giving us Isolation, so that when Customer One is reducing books-in-stock by one, and simultaneously increasing books-in-basket by one, any intermediate states are isolated from Customer Two, who has to wait a few milliseconds while the data store is made consistent.
Availability
Availability means just that – the service is available (to operate fully or not as above). When you buy the book you want to get a response, not some browser message about the web site being uncommunicative. Gilbert & Lynch in their proof of CAP Theorem make the good point that availability most often deserts you when you need it most – sites tend to go down at busy periods precisely because they are busy. A service that’s available but not being accessed is of no benefit to anyone.
Partition Tolerance
If your application and database runs on one box then (ignoring scale issues and assuming all your code is perfect) your server acts as a kind of atomic processor in that it either works or doesn’t (i.e. if it has crashed it’s not available, but it won’t cause data inconsistency either).
Once you start to spread data and logic around different nodes then there’s a risk of partitions forming. A partition happens when, say, a network cable gets chopped, and Node A can no longer communicate with Node B. With the kind of distribution capabilities the web provides, temporary partitions are a relatively common occurrence and, as I said earlier, they’re also not that rare inside global corporations with multiple data centres.