8 Database Concepts
8.1 Introduction
A database is a collection of information that is organized for easy access, management, and updates. In distributed systems, understanding database concepts like CAP theorem, ACID properties, replication, and sharding is crucial.
8.2 The Three Guarantees of Distributed Systems
Before diving into database-specific concepts, let’s understand the three fundamental guarantees that distributed database systems strive for:
8.2.1 1. Consistency
Definition: Every node in a distributed cluster returns the same, most recent, successful write.
What it means:
- All clients see the same data at the same time
- Reads reflect the latest writes
- No stale data
Example:
User updates profile: name = "Jane"
Immediately after, any read returns "Jane" (not old value)
8.2.2 2. Availability
Definition: Every non-failing node returns a response for all read and write requests in a reasonable amount of time.
What it means:
- System continues to operate even during failures
- Requests always get responses
- No request is ignored
Example:
Even if 2 out of 5 database nodes fail,
the system still processes read/write requests
8.2.3 3. Partition Tolerance
Definition: The system continues to function despite network partitions that prevent some nodes from communicating.
What it means:
- System works even when network splits
- Nodes can be temporarily disconnected
- Handles network failures gracefully
Example:
Data center network issue causes split
System continues operating on both sides

8.3 CAP Theorem
CAP theorem (Brewer’s theorem) states:
It is impossible for a distributed system to simultaneously provide more than two out of three guarantees: Consistency, Availability, and Partition Tolerance.

8.3.1 The Three Combinations
Since network partitions will happen in distributed systems, partition tolerance is required. This leaves us choosing between:
CP (Consistency + Partition Tolerance)
Characteristics:
- Sacrifices availability for consistency
- During partition, system may refuse requests
- Ensures all nodes have same data
When to choose:
- Financial transactions
- Inventory management
- When stale data is unacceptable
Examples:
- MongoDB: Can be configured for strong consistency
- HBase: Strong consistency, may become unavailable
- Redis: (with synchronous replication)
- ZooKeeper: Coordination service requiring consistency
Trade-off:
Network partition occurs
→ System blocks writes to maintain consistency
→ Some requests fail or timeout
AP (Availability + Partition Tolerance)
Characteristics:
- Sacrifices consistency for availability
- Always responds to requests
- May return stale data
- Eventually consistent
When to choose:
- Social media feeds
- Shopping carts
- Analytics data
- When availability > immediate consistency
Examples:
- Cassandra: Highly available, eventually consistent
- DynamoDB: Designed for availability
- CouchDB: Eventually consistent
- Riak: Optimized for availability
Trade-off:
Network partition occurs
→ System accepts writes on both sides
→ Data temporarily inconsistent
→ Conflict resolution needed later
CA (Consistency + Availability)
Not possible in distributed systems with partitions!
Why: Network partitions will happen. You must handle them.
Exception: Single-node databases (not distributed)
- Traditional RDBMS on single server (MySQL, PostgreSQL)
- Not fault-tolerant
- Not partition-tolerant (because no partitions exist)
8.3.2 Choosing Your Database Based on CAP
| Requirement | Choice | Examples |
|---|---|---|
| Bank transactions | CP | PostgreSQL (with sync replication), HBase |
| User sessions | AP | Cassandra, DynamoDB |
| Social feed | AP | Cassandra, DynamoDB |
| Inventory | CP | MongoDB, PostgreSQL |
| Analytics | AP | Cassandra, Riak |
8.4 ACID Properties
ACID ensures database transactions are processed reliably. Each transaction should be ACID compliant to maintain database integrity.
8.4.1 A - Atomicity
Definition: Transaction is all-or-nothing.
What it means:
- Either all operations succeed, or all fail
- No partial transactions
- System remains in consistent state
Example:
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- If second UPDATE fails, first UPDATE is rolled back
-- Money is never "lost" in transit8.4.2 C - Consistency
Definition: Transaction brings database from one valid state to another.
What it means:
- All constraints are enforced
- Referential integrity maintained
- Business rules applied
Example:
-- Constraint: balance >= 0
UPDATE accounts SET balance = balance - 1000 WHERE id = 1;
-- If this violates constraint (negative balance),
-- transaction is rejected8.4.3 I - Isolation
Definition: Concurrent transactions don’t interfere with each other.
What it means:
- Transactions appear to execute sequentially
- Intermediate state not visible to others
- Prevents dirty reads, phantom reads
Isolation Levels:
- Read Uncommitted: Lowest isolation (dirty reads possible)
- Read Committed: No dirty reads
- Repeatable Read: Same row read multiple times returns same value
- Serializable: Highest isolation (transactions fully isolated)
Example:
-- Transaction 1
BEGIN;
UPDATE products SET stock = stock - 1 WHERE id = 5;
-- Not yet committed
-- Transaction 2 (cannot see uncommitted changes)
SELECT stock FROM products WHERE id = 5;
-- Returns original value until Transaction 1 commits8.4.4 D - Durability
Definition: Committed transactions survive permanently, even after crashes.
What it means:
- Once committed, data is saved
- Survives power failures, crashes
- Write-ahead logging (WAL)
Example:
INSERT INTO orders (id, total) VALUES (123, 500);
COMMIT; -- Now persisted to disk
-- Even if database crashes 1 second later,
-- order 123 will exist when system restarts8.4.5 ACID in Distributed Systems
Challenges:
- Harder to maintain across multiple nodes
- Performance trade-offs
- Network latency
Solutions:
- Two-Phase Commit (2PC): Coordinated commits across nodes
- Saga Pattern: Compensating transactions for long-running workflows
- Event Sourcing: Store events, derive state
8.5 Database Replication
Replication is the process of copying data from one database to one or more databases.
8.5.1 Why Replicate?
Benefits:
- High Availability: Failover to replica if primary fails
- Read Scaling: Distribute read queries across replicas
- Geographic Distribution: Serve users from nearby replicas
- Backup: Real-time backup copies
8.5.2 Consistency Patterns
1. Strong Consistency
Definition: Read always returns most recent write.
How it works:
- Read waits for all replicas to acknowledge write
- Synchronous replication
- Higher latency
When to use:
- Financial systems
- Inventory management
- When accuracy is critical
Example:
Write to primary → Wait for all replicas to confirm → Return success
Read from any node → Always get latest value
2. Eventual Consistency
Definition: Reads may return stale data temporarily, but will eventually reflect all writes.
How it works:
- Asynchronous replication
- Writes return immediately
- Replicas catch up over time
When to use:
- Social media
- Non-critical data
- High throughput required
Example:
Write to primary → Return success immediately
Replicas update asynchronously (may take seconds)
Reads might return old value briefly
Conflict Resolution:
- Last Write Wins (LWW)
- Vector clocks
- Application-level resolution
8.5.3 Replication Architectures

1. Single-Leader (Primary-Replica)
How it works:
- One primary (leader) accepts writes
- Multiple replicas (followers) receive updates
- Replicas handle read queries
Pros:
- Simple to implement
- No write conflicts
- Good read scalability
Cons:
- Single point of failure for writes
- Replication lag on heavy writes
- Primary can be bottleneck
Example: MySQL Master-Slave, PostgreSQL Streaming Replication
2. Multi-Leader (Multi-Master)
How it works:
- Multiple nodes accept writes
- Each leader replicates to others
- Conflict resolution required
Pros:
- Better write throughput
- Lower latency (write to nearest leader)
- No single point of failure
Cons:
- Complex conflict resolution
- Harder to maintain consistency
- More complex setup
Example: MySQL Multi-Master, Cassandra (with LWW)
3. Leaderless (Peer-to-Peer)
How it works:
- All nodes equal
- Client writes to multiple nodes
- Quorum-based reads/writes
Quorum:
N = Total replicas
W = Write quorum (nodes that must acknowledge write)
R = Read quorum (nodes queried for read)
Strong consistency when: W + R > N
Example: N=5, W=3, R=3
(3 + 3 > 5, so consistent)
Pros:
- High availability
- No single point of failure
- Good for geo-distribution
Cons:
- Complex conflict resolution
- Network overhead
- Eventual consistency
Example: Cassandra, DynamoDB, Riak
8.7 NoSQL Databases
NoSQL databases provide alternatives to relational databases for specific use cases.
8.7.1 Types of NoSQL Databases
1. Key-Value Store
Structure: Simple key → value mapping
Characteristics:
- Very fast
- Simple API (GET, PUT, DELETE)
- No schema
- Limited query capabilities
Examples:
- Redis: In-memory, data structures
- Amazon DynamoDB: Managed, scalable
- Voldemort: Distributed
Use cases:
- Session storage
- Caching
- Shopping carts
- User preferences
2. Column-Oriented (Wide-Column)
Structure: Tables with rows and dynamic columns
Characteristics:
- Store data by column, not row
- Efficient for analytical queries
- Scalable
Examples:
- Cassandra: Distributed, highly available
- HBase: Hadoop-based
- Google Bigtable: Google’s internal system
Use cases:
- Time-series data
- Analytics
- Event logging
- IoT data
3. Document-Based
Structure: Store documents (JSON, BSON, XML)
Characteristics:
- Flexible schema
- Rich query capabilities
- Nested data structures
Examples:
- MongoDB: Popular, feature-rich
- CouchDB: HTTP-based API
- Amazon DocumentDB: MongoDB-compatible
Use cases:
- Content management
- User profiles
- Product catalogs
- Mobile apps
Example Document:
{
"_id": "user123",
"name": "John Doe",
"email": "john@example.com",
"addresses": [
{
"type": "home",
"street": "123 Main St",
"city": "New York"
}
],
"preferences": {
"newsletter": true,
"theme": "dark"
}
}4. Graph-Based
Structure: Nodes (entities) and edges (relationships)
Characteristics:
- Optimized for connected data
- Efficient traversals
- Relationship queries
Examples:
- Neo4j: Popular graph database
- Amazon Neptune: Managed graph DB
- ArangoDB: Multi-model
Use cases:
- Social networks
- Recommendation engines
- Fraud detection
- Knowledge graphs
Example:
// Find friends of friends
MATCH (user:Person {name: "Alice"})-[:FRIEND]->(friend)-[:FRIEND]->(fof)
RETURN fof.name
8.8 Summary
Choosing the right database involves understanding:
- CAP theorem: Trade-offs in distributed systems
- ACID properties: Transaction guarantees
- Replication: High availability and read scaling
- Sharding: Horizontal scaling for large datasets
- NoSQL types: Different data models for different needs
Decision factors:
- Data structure
- Query patterns
- Consistency requirements
- Scale requirements
- Cost constraints
Next chapter will explore web and application servers in system design.
