7  Storage and RAID

7.1 Storage Systems Overview

Storage is a critical component of system design. Understanding different storage architectures and redundancy techniques is essential for building reliable systems.

7.2 Object Store

Object storage (or object-based storage) is a storage architecture that manages data as objects, unlike traditional file systems which manage data as files in a hierarchical structure.

7.2.1 Characteristics

Objects consist of:

  • Data: The actual content (file, image, video, etc.)
  • Metadata: Information about the data (tags, timestamps, permissions)
  • Unique Identifier: Globally unique ID for retrieval

7.2.2 Key Features

  • Flat namespace: No hierarchical directory structure
  • Scalability: Easily scale to petabytes
  • Metadata-rich: Extensive custom metadata support
  • HTTP-based access: RESTful APIs (S3-compatible)
  • Durability: Built-in replication and redundancy

7.2.3 Use Cases

  • Static assets: Images, videos, documents
  • Backup and archiving: Long-term data retention
  • Big data analytics: Data lakes
  • Content distribution: Media streaming

7.2.5 Example: Storing User Uploads

# Upload file to object storage
s3_client.put_object(
    Bucket='user-uploads',
    Key='users/123/profile.jpg',
    Body=file_data,
    Metadata={
        'user-id': '123',
        'upload-date': '2024-01-01',
        'content-type': 'image/jpeg'
    }
)

# Retrieve file
object = s3_client.get_object(
    Bucket='user-uploads',
    Key='users/123/profile.jpg'
)

7.3 RAID (Redundant Array of Independent Disks)

RAID is a technology that combines multiple physical disk drives into a single logical unit to improve:

  • Performance: Data striping across disks
  • Reliability: Data redundancy and fault tolerance
  • Capacity: Aggregated storage space

7.3.1 How RAID Works

A RAID controller manages all operations:

  • Distributes data across disks
  • Handles parity calculations
  • Monitors disk health
  • Manages rebuilds after failure

7.3.2 Why Use RAID?

Without RAID:

  • Single disk failure = data loss
  • Limited performance
  • No redundancy

With RAID:

  • Fault tolerance (depends on level)
  • Improved read/write performance
  • Hot-swappable drives
  • Automatic rebuild capabilities

7.4 RAID Levels

7.4.1 RAID 0 – Striping

Configuration: Minimum 2 disks

How it works:

  • Data split into blocks
  • Blocks distributed across all disks
  • No redundancy

RAID 0 - Striping

Example:

File: [A1, A2, A3, A4]
Disk 1: [A1, A3]
Disk 2: [A2, A4]

Characteristics:

  • ✅ Maximum performance (parallel reads/writes)
  • ✅ Full capacity utilization
  • ✅ Simple to implement
  • No fault tolerance (any disk failure = total data loss)
  • ❌ Reliability decreases with more disks

Use cases:

  • Temporary data
  • Video editing workstations
  • When performance > data safety
  • Not recommended for production systems

Capacity: Total of all disks (2 x 1TB = 2TB)


7.4.2 RAID 1 – Mirroring

Configuration: Minimum 2 disks

How it works:

  • Data duplicated across all disks
  • Each disk contains identical copy

RAID 1 - Mirroring

Example:

File: [A1, A2, A3, A4]
Disk 1: [A1, A2, A3, A4]
Disk 2: [A1, A2, A3, A4]  ← Exact copy

Characteristics:

  • ✅ Excellent fault tolerance
  • ✅ Fast read performance (parallel reads)
  • ✅ Simple to implement
  • ✅ Easy recovery (just copy from working disk)
  • ❌ 50% capacity loss (2 x 1TB = 1TB usable)
  • ❌ Write performance same as single disk
  • ❌ Expensive (need 2x disks)

Use cases:

  • Operating system drives
  • Mission-critical databases
  • When data safety is paramount
  • Small storage needs with high reliability

Capacity: Half of total (2 x 1TB = 1TB usable)


7.4.3 RAID 2 – Bit-level Striping with Hamming Code

Configuration: Multiple disks with dedicated ECC disks

How it works:

  • Data striped at bit level
  • Hamming code for error correction
  • Dedicated parity disks

RAID 2

Characteristics:

  • Obsolete (replaced by RAID 3, 4, 5)
  • ❌ Complex implementation
  • ❌ Many parity disks required

Use cases:

  • Rarely used in practice
  • Historical significance only

7.4.4 RAID 3 – Byte-level Striping with Parity

Configuration: Minimum 3 disks (data + 1 parity)

How it works:

  • Data striped at byte level
  • Single dedicated parity disk
  • Parity allows reconstruction

RAID 3

Characteristics:

  • ✅ Good for sequential access
  • ✅ High data transfer rates
  • ❌ Poor random access performance
  • ❌ Parity disk can be bottleneck
  • Rarely used (RAID 5 preferred)

Use cases:

  • Video streaming servers
  • Large sequential file access
  • Mostly superseded by RAID 5

7.4.5 RAID 4 – Block-level Striping with Parity

Configuration: Minimum 3 disks

How it works:

  • Data striped at block level
  • Single dedicated parity disk
  • Can survive single disk failure

RAID 4

Characteristics:

  • ✅ Better random reads than RAID 3
  • ✅ Efficient capacity use
  • ❌ Parity disk write bottleneck
  • Rarely used (RAID 5 distributes parity)

Use cases:

  • Largely obsolete
  • RAID 5 is superior in almost all cases

7.4.6 RAID 5 – Striping with Distributed Parity

Configuration: Minimum 3 disks

How it works:

  • Data and parity striped across all disks
  • Parity distributed (no single parity disk)
  • Can survive single disk failure

RAID 5

Example with 3 disks:

Disk 1: [A1, A2, P3]
Disk 2: [B1, P2, B3]
Disk 3: [P1, C2, C3]

Characteristics:

  • ✅ Good balance of performance, capacity, reliability
  • ✅ Better write performance than RAID 4
  • ✅ Efficient capacity use (n-1 disks usable)
  • ✅ Can survive single disk failure
  • ❌ Slow rebuild times (parity recalculation)
  • ❌ Vulnerable during rebuild
  • ❌ Write penalty (read-modify-write for parity)

Use cases:

  • Most popular RAID level
  • General-purpose file servers
  • Application servers
  • Database servers (with moderate write load)

Capacity: (N-1) × Disk Size (3 x 1TB = 2TB usable)

Performance:

  • Reads: Good (parallel)
  • Writes: Moderate (parity overhead)

7.4.7 RAID 6 – Striping with Double Parity

Configuration: Minimum 4 disks

How it works:

  • Like RAID 5, but with two parity blocks
  • Can survive two simultaneous disk failures
  • Distributed across all disks

RAID 6

Characteristics:

  • ✅ Survives 2 disk failures
  • ✅ Safer during rebuilds
  • ✅ Better for large arrays (more disks = higher failure probability)
  • ❌ Slower writes (double parity calculation)
  • ❌ More complex controller
  • ❌ Lower usable capacity than RAID 5

Use cases:

  • Critical data with high availability requirements
  • Large disk arrays (>6 disks)
  • Environments where rebuild time is long
  • When double redundancy required

Capacity: (N-2) × Disk Size (4 x 1TB = 2TB usable)


7.4.8 RAID 10 (1+0) – Mirroring + Striping

Configuration: Minimum 4 disks (even number required)

How it works:

  1. Create RAID 1 mirrors (pairs of disks)
  2. Stripe across the mirrored sets (RAID 0)

RAID 10

Example with 4 disks:

Mirror 1: Disk 1 ↔ Disk 2
Mirror 2: Disk 3 ↔ Disk 4
RAID 0 across Mirror 1 and Mirror 2

Characteristics:

  • Excellent performance (reads and writes)
  • ✅ High fault tolerance (can survive multiple failures if in different mirrors)
  • ✅ Fast rebuild (just copy from mirror)
  • ✅ No parity overhead
  • ❌ 50% capacity loss
  • ❌ Expensive (requires many disks)

Use cases:

  • High-performance databases
  • I/O intensive applications
  • When both performance and reliability critical
  • Enterprise applications

Capacity: 50% of total (4 x 1TB = 2TB usable)

Performance:

  • Reads: Excellent
  • Writes: Excellent

7.5 RAID Comparison Table

RAID Level Min Disks Usable Capacity Fault Tolerance Read Perf Write Perf Use Case
RAID 0 2 100% None Excellent Excellent Temp data, performance
RAID 1 2 50% 1 disk Good Moderate OS drives, small critical
RAID 5 3 (N-1)/N 1 disk Good Moderate General purpose
RAID 6 4 (N-2)/N 2 disks Good Moderate Large arrays, critical
RAID 10 4 50% Multiple* Excellent Excellent Databases, high perf

*Can survive multiple disk failures if they’re in different mirror sets

7.6 Hardware vs Software RAID

7.6.1 Hardware RAID

Dedicated RAID controller card

Pros:

  • Better performance (dedicated processor)
  • Battery-backed cache
  • No CPU overhead on host
  • Often hot-swappable

Cons:

  • Expensive
  • Controller failure = need identical controller
  • Vendor lock-in

7.6.2 Software RAID

Operating system manages RAID

Pros:

  • No additional hardware cost
  • Flexible configuration
  • No vendor lock-in
  • Easy to migrate

Cons:

  • CPU overhead
  • Potentially lower performance
  • OS-dependent

Popular software RAID:

  • Linux: mdadm
  • Windows: Storage Spaces
  • ZFS (Solaris, FreeBSD, Linux)

7.7 Best Practices

7.7.1 1. Choose RAID Level Based on Needs

  • Performance priority: RAID 0, RAID 10
  • Cost + reliability: RAID 5
  • Maximum reliability: RAID 6, RAID 10
  • Small critical data: RAID 1

7.7.2 2. Use Enterprise-Grade Disks

  • Higher MTBF (Mean Time Between Failures)
  • Better error handling
  • Worth the investment for production

7.7.3 3. Monitor Disk Health

  • SMART monitoring
  • Proactive disk replacement
  • Alert on disk errors

7.7.4 4. Regular Backups

  • RAID is NOT a backup!
  • Protects against disk failure, not:
    • Accidental deletion
    • Ransomware
    • Data corruption
    • Natural disasters

7.7.5 5. Hot Spares

  • Keep spare disk(s) in array
  • Automatic rebuild on failure
  • Reduces downtime

7.7.6 6. Plan for Rebuild Time

  • Large disks = long rebuild (hours to days)
  • System vulnerable during rebuild
  • Consider RAID 6 for large arrays

7.8 Summary

Storage design involves choosing the right technology for your needs:

  • Object storage for scalable, unstructured data
  • RAID for redundancy and performance at the disk level

Key takeaways:

  • RAID 0: Performance, no reliability
  • RAID 1: Simple mirroring
  • RAID 5: Best general-purpose choice
  • RAID 6: Extra safety for large arrays
  • RAID 10: Performance + reliability (expensive)
  • RAID ≠ Backup

Choose based on your performance, capacity, and reliability requirements.