Content Role: pillar

Distributed Caching: Redis Cluster Configuration

Cache invalidation patterns and read-through strategies at scale

Distributed caching remains one of the most effective techniques for reducing database load and improving application response times. However, implementing a robust distributed caching strategy requires careful consideration of consistency models, invalidation patterns, and failure scenarios. This guide examines production-ready Redis Cluster configurations with practical TypeScript implementations.

The Problem with Naive Caching Approaches

Most applications start with simple in-memory caching or a single Redis instance. These approaches fail under several conditions:

Single point of failure: A single Redis instance creates availability risks. When it fails, all cache requests hit the database directly, potentially causing cascading failures.

Memory limitations: Individual nodes have finite memory. As data grows, you face eviction pressure that degrades cache hit rates.

Cache coherence: Multiple application instances with local caches create consistency problems. Stale data persists until TTL expiration, leading to incorrect application behavior.

Geographic distribution: Users in different regions experience high latency when accessing a centralized cache.

A distributed caching strategy addresses these limitations through data partitioning, replication, and coordinated invalidation.

Redis Cluster Architecture Fundamentals

Redis Cluster provides automatic sharding across multiple nodes without requiring external coordination services. Understanding its architecture is essential for effective configuration.

Hash Slot Distribution

Redis Cluster divides the key space into 16,384 hash slots. Each master node owns a subset of these slots. The cluster calculates slot assignment using CRC16:

HASH_SLOT = CRC16(key) mod 16384

For keys with hash tags, only the portion between curly braces determines the slot:

// These keys map to the same slot
const userKey = "user:{12345}:profile";
const userOrders = "user:{12345}:orders";

This enables multi-key operations on related data.

Replication and Failover

Each master node can have multiple replica nodes. Replicas continuously sync with their master and automatically promote to master during failures. The cluster uses a quorum-based approach for failover decisions.

Production Redis Cluster Configuration

Here's a production-ready configuration for a three-master, three-replica cluster:

# redis-node-1.conf
port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000
appendonly yes
appendfilename "appendonly-7000.aof"

# Replication settings
repl-diskless-sync yes
repl-diskless-sync-delay 5

# Memory management
maxmemory 2gb
maxmemory-policy allkeys-lru

# Persistence
save 900 1
save 300 10
save 60 10000

# Network
tcp-backlog 511
timeout 0
tcp-keepalive 300

Key configuration decisions:

cluster-node-timeout: Controls failure detection sensitivity. Lower values enable faster failover but increase false positives during network partitions.

maxmemory-policy: allkeys-lru evicts least recently used keys regardless of TTL. Use volatile-lru to only evict keys with expiration set.

repl-diskless-sync: Enables direct socket-to-socket replication without intermediate disk writes, reducing sync time.

Implementing Read-Through Cache Pattern

The read-through pattern centralizes cache logic, ensuring consistent behavior across your application:

import { Cluster } from 'ioredis';

interface CacheConfig {
  ttl: number;
  refreshThreshold?: number;
}

class ReadThroughCache<T> {
  private cluster: Cluster;

  constructor(nodes: string[]) {
    this.cluster = new Cluster(
      nodes.map(node => {
        const [host, port] = node.split(':');
        return { host, port: parseInt(port) };
      }),
      {
        redisOptions: {
          maxRetriesPerRequest: 3,
          enableReadyCheck: true,
        },
        clusterRetryStrategy: (times) => {
          return Math.min(times * 100, 2000);
        },
      }
    );
  }

  async get(
    key: string,
    loader: () => Promise<T>,
    config: CacheConfig
  ): Promise<T> {
    try {
      const cached = await this.cluster.get(key);

      if (cached) {
        const data = JSON.parse(cached) as T;

        // Proactive refresh for near-expiry keys
        if (config.refreshThreshold) {
          const ttl = await this.cluster.ttl(key);
          if (ttl > 0 && ttl < config.refreshThreshold) {
            this.refreshAsync(key, loader, config.ttl);
          }
        }

        return data;
      }
    } catch (error) {
      console.error('Cache read error:', error);
      // Fall through to loader
    }

    return this.loadAndCache(key, loader, config.ttl);
  }

  private async loadAndCache(
    key: string,
    loader: () => Promise<T>,
    ttl: number
  ): Promise<T> {
    const data = await loader();

    try {
      await this.cluster.setex(
        key,
        ttl,
        JSON.stringify(data)
      );
    } catch (error) {
      console.error('Cache write error:', error);
      // Return data even if caching fails
    }

    return data;
  }

  private refreshAsync(
    key: string,
    loader: () => Promise<T>,
    ttl: number
  ): void {
    this.loadAndCache(key, loader, ttl).catch(error => {
      console.error('Background refresh failed:', error);
    });
  }
}

This implementation includes proactive refresh to prevent cache stampedes when popular keys expire.

Cache Invalidation Patterns

Cache invalidation is notoriously difficult. Here are three proven patterns:

1. Time-Based Expiration with Versioning

interface VersionedCache {
  version: number;
  data: any;
  timestamp: number;
}

class VersionedCacheManager {
  private cluster: Cluster;
  private versionKey = (entity: string) => `version:${entity}`;

  async set(key: string, data: any, ttl: number): Promise<void> {
    const version = await this.getVersion(key);
    const cached: VersionedCache = {
      version,
      data,
      timestamp: Date.now(),
    };

    await this.cluster.setex(key, ttl, JSON.stringify(cached));
  }

  async get(key: string): Promise<any | null> {
    const cached = await this.cluster.get(key);
    if (!cached) return null;

    const parsed: VersionedCache = JSON.parse(cached);
    const currentVersion = await this.getVersion(key);

    if (parsed.version !== currentVersion) {
      await this.cluster.del(key);
      return null;
    }

    return parsed.data;
  }

  async invalidate(entity: string): Promise<void> {
    await this.cluster.incr(this.versionKey(entity));
  }

  private async getVersion(key: string): Promise<number> {
    const version = await this.cluster.get(this.versionKey(key));
    return version ? parseInt(version) : 0;
  }
}

2. Event-Driven Invalidation

import { EventEmitter } from 'events';

class EventDrivenCache extends EventEmitter {
  private cluster: Cluster;

  constructor(cluster: Cluster) {
    super();
    this.cluster = cluster;
    this.setupInvalidationHandlers();
  }

  private setupInvalidationHandlers(): void {
    this.on('user:updated', async (userId: string) => {
      const pattern = `user:${userId}:*`;
      await this.invalidatePattern(pattern);
    });

    this.on('product:updated', async (productId: string) => {
      await this.cluster.del(
        `product:${productId}`,
        `product:${productId}:inventory`,
        `product:${productId}:reviews`
      );
    });
  }

  private async invalidatePattern(pattern: string): Promise<void> {
    const nodes = this.cluster.nodes('master');

    await Promise.all(
      nodes.map(async (node) => {
        const keys = await node.keys(pattern);
        if (keys.length > 0) {
          await node.del(...keys);
        }
      })
    );
  }
}

3. Write-Through with Immediate Invalidation

class WriteThroughCache {
  private cluster: Cluster;

  async update(
    key: string,
    updater: () => Promise<any>,
    ttl: number
  ): Promise<void> {
    // Update source of truth
    const newData = await updater();

    // Immediately update cache
    await this.cluster.setex(key, ttl, JSON.stringify(newData));
  }

  async delete(key: string, deleter: () => Promise<void>): Promise<void> {
    // Delete from source
    await deleter();

    // Remove from cache
    await this.cluster.del(key);
  }
}

Common Pitfalls

Cache Stampede

When a popular key expires, multiple requests simultaneously attempt to regenerate it. Prevent this with locking:

async getWithLock(
  key: string,
  loader: () => Promise<T>,
  ttl: number
): Promise<T> {
  const lockKey = `lock:${key}`;
  const lockAcquired = await this.cluster.set(
    lockKey,
    '1',
    'EX',
    10,
    'NX'
  );

  if (lockAcquired) {
    try {
      const data = await loader();
      await this.cluster.setex(key, ttl, JSON.stringify(data));
      return data;
    } finally {
      await this.cluster.del(lockKey);
    }
  }

  // Wait and retry
  await new Promise(resolve => setTimeout(resolve, 100));
  return this.get(key, loader, { ttl });
}

Hot Key Problem

Uneven key distribution causes some nodes to handle disproportionate load. Solutions:

Use hash tags to control key distribution
Implement local caching for extremely hot keys
Add random jitter to TTLs to prevent synchronized expiration

Network Partition Handling

During network splits, Redis Cluster may serve stale data. For critical consistency:

async getWithConsistency(key: string): Promise<T | null> {
  const quorum = Math.floor(this.cluster.nodes().length / 2) + 1;
  const results = await Promise.allSettled(
    this.cluster.nodes().map(node => node.get(key))
  );

  const values = results
    .filter(r => r.status === 'fulfilled')
    .map(r => (r as PromiseFulfilledResult<string>).value);

  if (values.length < quorum) {
    throw new Error('Insufficient nodes for quorum');
  }

  // Return most common value
  const counts = new Map<string, number>();
  values.forEach(v => counts.set(v, (counts.get(v) || 0) + 1));

  const [mostCommon] = [...counts.entries()]
    .sort((a, b) => b[1] - a[1])[0];

  return mostCommon ? JSON.parse(mostCommon) : null;
}

Best Practices Checklist

[ ] Configure appropriate maxmemory-policy based on workload
[ ] Set cluster-node-timeout balancing failover speed and stability
[ ] Enable persistence (AOF or RDB) for data durability
[ ] Implement circuit breakers for cache failures
[ ] Monitor cache hit rates and adjust TTLs accordingly
[ ] Use hash tags for related keys requiring multi-key operations
[ ] Implement gradual TTL expiration to prevent thundering herd
[ ] Set up monitoring for cluster health and slot distribution
[ ] Test failover scenarios in staging environments
[ ] Document cache key naming conventions and TTL policies
[ ] Implement cache warming for critical data after deployments
[ ] Use connection pooling to prevent socket exhaustion

FAQ

Q: How do I choose between Redis Cluster and Redis Sentinel?

Redis Cluster provides automatic sharding and is appropriate when data exceeds single-node memory. Redis Sentinel offers high availability for single-master setups without sharding. Use Cluster when you need horizontal scaling; use Sentinel when you need only failover capabilities.

Q: What's the optimal number of master nodes?

Start with three masters for production. This provides fault tolerance while minimizing operational complexity. Scale to more masters when individual nodes approach memory limits or when request distribution becomes uneven.

Q: How should I handle cache misses during database outages?

Implement stale-while-revalidate: serve expired cache entries during database unavailability. Set a flag indicating staleness and attempt background refresh. This maintains availability at the cost of temporary inconsistency.

Q: Should I cache null or empty results?

Yes, cache negative results with shorter TTLs to prevent repeated database queries for non-existent data. This protects against cache penetration attacks and reduces unnecessary load.

Q: How do I migrate data between cache clusters?

Use dual-write strategy: write to both old and new clusters while reading from the new cluster with fallback to old. After TTL expiration on the old cluster, decommission it. Alternatively, use Redis MIGRATE command for online migration.

Q: What's the impact of network latency on cache performance?

Network latency directly affects cache effectiveness. For sub-millisecond response times, consider regional cache clusters or local caching layers. Monitor P99 latency and ensure it remains significantly lower than database query times.

Q: How do I prevent cache memory exhaustion?

Set maxmemory limits and choose appropriate eviction policies. Monitor memory usage and eviction rates. Implement tiered caching with hot data in Redis and warm data in secondary storage. Consider increasing cluster size before eviction rates impact hit ratios.

Distributed Caching: Redis Cluster Configuration

Distributed Caching: Redis Cluster Configuration

Cache invalidation patterns and read-through strategies at scale

The Problem with Naive Caching Approaches

Redis Cluster Architecture Fundamentals

Hash Slot Distribution

Replication and Failover

Production Redis Cluster Configuration

Implementing Read-Through Cache Pattern

Cache Invalidation Patterns

1. Time-Based Expiration with Versioning

2. Event-Driven Invalidation

3. Write-Through with Immediate Invalidation

Common Pitfalls

Cache Stampede

Hot Key Problem

Network Partition Handling

Best Practices Checklist

FAQ

Comments

More from this blog

Embedding-First Architecture for Real-World LLM Apps

AI/ML Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Command Palette

Distributed Caching: Redis Cluster Configuration

Cache invalidation patterns and read-through strategies at scale

The Problem with Naive Caching Approaches

Redis Cluster Architecture Fundamentals

Hash Slot Distribution

Replication and Failover

Production Redis Cluster Configuration

Implementing Read-Through Cache Pattern

Cache Invalidation Patterns

1. Time-Based Expiration with Versioning

2. Event-Driven Invalidation

3. Write-Through with Immediate Invalidation

Common Pitfalls

Cache Stampede

Hot Key Problem

Network Partition Handling

Best Practices Checklist

FAQ

Comments

More from this blog