Content Role: pillar

Database Sharding: Horizontal Scaling Patterns

Partition key selection and cross-shard queries in distributed SQL

When your database reaches millions of rows and query performance degrades despite indexing optimizations, you face a fundamental architectural decision: vertical scaling (bigger hardware) or horizontal scaling (more machines). Database sharding—partitioning data across multiple database instances—represents the horizontal approach, but it introduces complexity that can cripple your application if implemented incorrectly.

The Problem: When Single-Instance Databases Hit Their Limits

A monolithic database eventually encounters hard limits. You might observe:

Query latency increasing linearly with data volume despite proper indexing
Write throughput bottlenecks as a single instance handles all transactions
Storage constraints requiring expensive hardware upgrades
Backup and recovery operations taking hours or days
Connection pool exhaustion during traffic spikes

Consider an e-commerce platform storing 500 million orders. Even with optimized indexes, queries scanning recent orders across all customers become progressively slower. The database CPU maxes out during peak hours, and adding read replicas only helps read-heavy workloads—write operations still bottleneck on the primary instance.

Understanding Database Sharding Strategy

Sharding distributes data across multiple independent database instances (shards), each containing a subset of the total dataset. Unlike replication, where each instance holds complete data copies, sharding partitions data so each shard stores unique rows.

Core Sharding Concepts

Shard Key: The column(s) determining which shard stores each row. This is your most critical design decision.

Routing Layer: Logic directing queries to appropriate shards based on the shard key.

Shard Map: Metadata mapping shard key ranges or hash values to physical database instances.

Partition Key Selection Patterns

Your shard key fundamentally shapes system performance and operational complexity.

Hash-Based Sharding

Distribute data evenly by hashing the shard key:

interface ShardConfig {
  shardCount: number;
  shards: DatabaseConnection[];
}

class HashShardRouter {
  constructor(private config: ShardConfig) {}

  getShardForKey(userId: string): DatabaseConnection {
    const hash = this.hashFunction(userId);
    const shardIndex = hash % this.config.shardCount;
    return this.config.shards[shardIndex];
  }

  private hashFunction(key: string): number {
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash = ((hash << 5) - hash) + key.charCodeAt(i);
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }

  async insertOrder(order: Order): Promise<void> {
    const shard = this.getShardForKey(order.userId);
    await shard.query(
      'INSERT INTO orders (id, user_id, total, created_at) VALUES ($1, $2, $3, $4)',
      [order.id, order.userId, order.total, order.createdAt]
    );
  }
}

Advantages: Even distribution, simple implementation Disadvantages: Range queries require querying all shards, resharding is complex

Range-Based Sharding

Partition data by key ranges:

interface ShardRange {
  minKey: string;
  maxKey: string;
  connection: DatabaseConnection;
}

class RangeShardRouter {
  constructor(private ranges: ShardRange[]) {
    // Ranges must be sorted by minKey
    this.ranges.sort((a, b) => a.minKey.localeCompare(b.minKey));
  }

  getShardForKey(customerId: string): DatabaseConnection {
    for (const range of this.ranges) {
      if (customerId >= range.minKey && customerId <= range.maxKey) {
        return range.connection;
      }
    }
    throw new Error(`No shard found for key: ${customerId}`);
  }

  async getCustomerOrders(
    customerId: string,
    startDate: Date,
    endDate: Date
  ): Promise<Order[]> {
    const shard = this.getShardForKey(customerId);
    const result = await shard.query(
      'SELECT * FROM orders WHERE customer_id = $1 AND created_at BETWEEN $2 AND $3',
      [customerId, startDate, endDate]
    );
    return result.rows;
  }
}

Advantages: Efficient range queries, easier to rebalance specific ranges Disadvantages: Risk of hotspots if data isn't uniformly distributed

Geographic Sharding

Partition by geographic region for latency optimization:

enum Region {
  US_EAST = 'us-east',
  US_WEST = 'us-west',
  EU = 'eu',
  ASIA = 'asia'
}

class GeoShardRouter {
  private shardMap: Map<Region, DatabaseConnection>;

  constructor(connections: Map<Region, DatabaseConnection>) {
    this.shardMap = connections;
  }

  getShardForRegion(region: Region): DatabaseConnection {
    const shard = this.shardMap.get(region);
    if (!shard) {
      throw new Error(`No shard configured for region: ${region}`);
    }
    return shard;
  }

  async createUser(user: User, region: Region): Promise<void> {
    const shard = this.getShardForRegion(region);
    await shard.query(
      'INSERT INTO users (id, email, region, created_at) VALUES ($1, $2, $3, $4)',
      [user.id, user.email, region, new Date()]
    );
  }
}

Advantages: Reduced latency, data residency compliance Disadvantages: Uneven load distribution, complex cross-region queries

Handling Cross-Shard Queries

Cross-shard operations represent the primary complexity in sharded architectures.

Scatter-Gather Pattern

Query all shards and aggregate results:

class ScatterGatherQuery {
  constructor(private shards: DatabaseConnection[]) {}

  async getTotalRevenue(startDate: Date, endDate: Date): Promise<number> {
    const queries = this.shards.map(shard =>
      shard.query(
        'SELECT SUM(total) as revenue FROM orders WHERE created_at BETWEEN $1 AND $2',
        [startDate, endDate]
      )
    );

    const results = await Promise.all(queries);

    return results.reduce((total, result) => {
      return total + (parseFloat(result.rows[0]?.revenue) || 0);
    }, 0);
  }

  async searchUsers(emailPattern: string, limit: number): Promise<User[]> {
    const queries = this.shards.map(shard =>
      shard.query(
        'SELECT * FROM users WHERE email LIKE $1 LIMIT $2',
        [emailPattern, limit]
      )
    );

    const results = await Promise.all(queries);
    const allUsers = results.flatMap(r => r.rows);

    // Sort and limit across all shards
    return allUsers
      .sort((a, b) => a.email.localeCompare(b.email))
      .slice(0, limit);
  }
}

Denormalization Strategy

Duplicate data to avoid cross-shard joins:

interface OrderWithCustomer {
  orderId: string;
  userId: string;
  // Denormalized customer data
  customerEmail: string;
  customerName: string;
  total: number;
}

class DenormalizedOrderService {
  async createOrder(
    order: Order,
    customer: Customer,
    router: HashShardRouter
  ): Promise<void> {
    const shard = router.getShardForKey(order.userId);

    // Store order with denormalized customer data
    await shard.query(
      `INSERT INTO orders 
       (id, user_id, customer_email, customer_name, total, created_at) 
       VALUES ($1, $2, $3, $4, $5, $6)`,
      [
        order.id,
        order.userId,
        customer.email,
        customer.name,
        order.total,
        new Date()
      ]
    );
  }
}

Common Pitfalls

Choosing the Wrong Shard Key

Selecting a low-cardinality key (e.g., country code with only 50 values) creates hotspots. Always choose high-cardinality keys that distribute data evenly.

Ignoring Transaction Boundaries

Distributed transactions across shards are expensive and complex. Design your schema so transactions stay within single shards:

// BAD: Cross-shard transaction
async transferBetweenUsers(fromUserId: string, toUserId: string, amount: number) {
  // fromUserId and toUserId might be on different shards
  // Requires distributed transaction coordination
}

// GOOD: Single-shard transaction
async transferWithinAccount(userId: string, fromWallet: string, toWallet: string, amount: number) {
  // All data for userId is on the same shard
  const shard = this.router.getShardForKey(userId);
  await shard.query('BEGIN');
  // ... transfer logic
  await shard.query('COMMIT');
}

Premature Sharding

Sharding adds significant complexity. Exhaust vertical scaling and read replicas first. Shard only when:

Single-instance performance is consistently inadequate
Data volume exceeds practical single-instance limits
Write throughput requires horizontal distribution

Neglecting Monitoring

Implement per-shard metrics:

class MonitoredShardRouter {
  private metrics: Map<string, ShardMetrics> = new Map();

  async executeQuery(shardKey: string, query: string): Promise<any> {
    const startTime = Date.now();
    const shard = this.getShardForKey(shardKey);

    try {
      const result = await shard.query(query);
      this.recordMetric(shard.id, Date.now() - startTime, true);
      return result;
    } catch (error) {
      this.recordMetric(shard.id, Date.now() - startTime, false);
      throw error;
    }
  }

  private recordMetric(shardId: string, latency: number, success: boolean) {
    // Send to monitoring system (Prometheus, DataDog, etc.)
  }
}

Best Practices Checklist

Choose shard keys based on access patterns: Analyze your most frequent queries
Maintain shard key immutability: Never update shard key values; it requires moving data between shards
Implement connection pooling per shard: Avoid connection exhaustion
Design for single-shard queries: Structure data to minimize cross-shard operations
Plan resharding strategy upfront: Use consistent hashing or virtual shards for easier rebalancing
Monitor shard balance: Track data distribution and query load per shard
Automate shard provisioning: Prepare infrastructure-as-code for adding shards
Document shard topology: Maintain clear documentation of shard mappings and routing logic
Test failure scenarios: Verify behavior when individual shards become unavailable
Implement circuit breakers: Prevent cascading failures when shards experience issues

FAQ

Q: When should I implement sharding versus using read replicas?

Read replicas only help read-heavy workloads. If write throughput is your bottleneck, or if your dataset exceeds single-instance storage capacity, sharding is necessary. Use read replicas first for read scaling; add sharding when writes or storage become constraints.

Q: Can I change the shard key after implementation?

Changing shard keys requires migrating all data, essentially rebuilding your sharded architecture. Choose carefully upfront. If you must change it, plan for significant downtime or implement a complex dual-write migration strategy.

Q: How do I handle foreign key relationships across shards?

You can't enforce foreign key constraints across shards at the database level. Options include: (1) denormalize data to keep related records on the same shard, (2) enforce referential integrity in application code, or (3) use eventual consistency patterns with background validation jobs.

Q: What's the difference between sharding and partitioning?

Partitioning typically refers to dividing a table within a single database instance (horizontal or vertical partitioning). Sharding distributes data across multiple independent database instances. Sharding is a form of horizontal partitioning at the infrastructure level.

Q: How many shards should I start with?

Start with 2-4 shards to validate your sharding logic without excessive complexity. Plan for growth by using consistent hashing or virtual shards that allow adding physical shards without complete data redistribution. Over-sharding initially creates unnecessary operational overhead.

Q: How do I perform backups in a sharded environment?

Backup each shard independently, but coordinate timing to maintain consistency for cross-shard relationships. Use point-in-time recovery capabilities and maintain backup metadata indicating which backups correspond to the same logical point in time across all shards.

Q: What happens when one shard fails?

Design for partial availability. Implement health checks and circuit breakers so your application can continue serving requests for data on healthy shards. Consider replication within each shard (primary-replica setup per shard) for high availability. Have runbooks for shard recovery procedures.

Database Sharding: Horizontal Scaling Patterns

Database Sharding: Horizontal Scaling Patterns

Partition key selection and cross-shard queries in distributed SQL

The Problem: When Single-Instance Databases Hit Their Limits

Understanding Database Sharding Strategy

Core Sharding Concepts

Partition Key Selection Patterns

Hash-Based Sharding

Range-Based Sharding

Geographic Sharding

Handling Cross-Shard Queries

Scatter-Gather Pattern

Denormalization Strategy

Common Pitfalls

Choosing the Wrong Shard Key

Ignoring Transaction Boundaries

Premature Sharding

Neglecting Monitoring

Best Practices Checklist

FAQ

Comments

More from this blog

Embedding-First Architecture for Real-World LLM Apps

AI/ML Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Command Palette

Database Sharding: Horizontal Scaling Patterns

Partition key selection and cross-shard queries in distributed SQL

The Problem: When Single-Instance Databases Hit Their Limits

Understanding Database Sharding Strategy

Core Sharding Concepts

Partition Key Selection Patterns

Hash-Based Sharding

Range-Based Sharding

Geographic Sharding

Handling Cross-Shard Queries

Scatter-Gather Pattern

Denormalization Strategy

Common Pitfalls

Choosing the Wrong Shard Key

Ignoring Transaction Boundaries

Premature Sharding

Neglecting Monitoring

Best Practices Checklist

FAQ

Comments

More from this blog