Why Traditional Geographic Routing Falls Short

Legacy CDN implementations route traffic based on a simple premise: send users to the nearest point of presence (PoP) by geographic distance. This approach made sense when CDN infrastructure was sparse and network topology was relatively predictable. In 2025, this model breaks down for several critical reasons.

Network distance rarely correlates with latency. A user in Singapore might experience lower latency connecting to a Tokyo PoP than to a geographically closer location in Jakarta due to superior fiber infrastructure and peering arrangements. Internet exchange points, submarine cable routes, and tier-1 carrier relationships create asymmetric network topologies where physical proximity is a poor predictor of performance.

Cloud provider regions and edge locations have proliferated. AWS operates 450+ edge locations, Cloudflare spans 300+ cities, and Fastly maintains 70+ PoPs globally. This density means multiple viable routing options exist for most users, making intelligent selection crucial rather than defaulting to the closest location.

Network conditions change constantly. Peak traffic hours, DDoS attacks, fiber cuts, and routing changes alter latency profiles throughout the day. Static geographic routing cannot adapt to these dynamic conditions, leaving users stuck with suboptimal paths until manual intervention occurs.

Regulatory and data sovereignty requirements complicate routing decisions. GDPR, data localization laws, and privacy regulations in 2025 require content to remain within specific jurisdictions, adding constraints that pure distance-based routing cannot handle.

Modern Latency-Based Traffic Management Architecture

Contemporary CDN geo-routing systems implement multi-layered decision engines that combine real-time latency measurements, historical performance data, health checks, and policy constraints to route each request optimally.

The architecture consists of four primary components:

Real-Time Measurement Network: Distributed probes continuously measure latency from various geographic locations to all available PoPs. These measurements feed into a global routing database updated every few seconds.

Intelligent DNS Layer: Authoritative DNS servers receive latency data and make routing decisions at query time, returning IP addresses for the optimal PoP based on the client's resolver location and current network conditions.

Edge Decision Engine: For applications using Anycast or application-layer routing, edge nodes make routing decisions based on connection characteristics, request headers, and real-time performance metrics.

Feedback Loop System: Application-level metrics (actual response times, error rates, throughput) feed back into the routing system to validate and refine routing decisions.

Here's a production-grade implementation of a latency-based routing decision engine:

interface LatencyMeasurement {
  popId: string;
  region: string;
  latencyMs: number;
  timestamp: number;
  reliability: number; // 0-1 score based on measurement confidence
}

interface RoutingConstraint {
  type: 'data_residency' | 'compliance' | 'cost' | 'capacity';
  allowedRegions?: string[];
  excludedPops?: string[];
  maxCostPerGb?: number;
}

interface PopMetrics {
  popId: string;
  currentLoad: number; // 0-1 utilization
  availableCapacity: number; // Gbps
  healthScore: number; // 0-1 composite health
  costPerGb: number;
}

class LatencyBasedRouter {
  private latencyCache: Map<string, LatencyMeasurement[]>;
  private popMetrics: Map<string, PopMetrics>;
  private readonly LATENCY_WEIGHT = 0.6;
  private readonly LOAD_WEIGHT = 0.25;
  private readonly RELIABILITY_WEIGHT = 0.15;
  private readonly STALE_THRESHOLD_MS = 30000;

  constructor() {
    this.latencyCache = new Map();
    this.popMetrics = new Map();
  }

  async selectOptimalPop(
    clientLocation: string,
    constraints: RoutingConstraint[],
    contentSize: number
  ): Promise<string> {
    const measurements = this.getRecentMeasurements(clientLocation);
    const eligiblePops = this.filterByConstraints(measurements, constraints);

    if (eligiblePops.length === 0) {
      throw new Error('No eligible PoPs available for routing constraints');
    }

    const scoredPops = eligiblePops.map(measurement => {
      const metrics = this.popMetrics.get(measurement.popId);
      if (!metrics) return null;

      // Normalize latency score (lower is better, inverted for scoring)
      const latencyScore = Math.max(0, 1 - (measurement.latencyMs / 500));

      // Load score (lower load is better)
      const loadScore = 1 - metrics.currentLoad;

      // Reliability score from measurement confidence
      const reliabilityScore = measurement.reliability;

      // Composite score with configurable weights
      const totalScore = 
        (latencyScore * this.LATENCY_WEIGHT) +
        (loadScore * this.LOAD_WEIGHT) +
        (reliabilityScore * this.RELIABILITY_WEIGHT);

      return {
        popId: measurement.popId,
        score: totalScore,
        latency: measurement.latencyMs,
        load: metrics.currentLoad,
        estimatedCost: (contentSize / 1e9) * metrics.costPerGb
      };
    }).filter(Boolean);

    // Sort by score descending
    scoredPops.sort((a, b) => b!.score - a!.score);

    // Apply capacity check for top candidate
    const topCandidate = scoredPops[0]!;
    const metrics = this.popMetrics.get(topCandidate.popId)!;

    if (metrics.availableCapacity < (contentSize / 1e9)) {
      // Fallback to second best if capacity insufficient
      return scoredPops[1]?.popId || topCandidate.popId;
    }

    return topCandidate.popId;
  }

  private getRecentMeasurements(clientLocation: string): LatencyMeasurement[] {
    const measurements = this.latencyCache.get(clientLocation) || [];
    const now = Date.now();

    return measurements.filter(m => 
      (now - m.timestamp) < this.STALE_THRESHOLD_MS
    );
  }

  private filterByConstraints(
    measurements: LatencyMeasurement[],
    constraints: RoutingConstraint[]
  ): LatencyMeasurement[] {
    return measurements.filter(measurement => {
      return constraints.every(constraint => {
        switch (constraint.type) {
          case 'data_residency':
            return constraint.allowedRegions?.includes(measurement.region);

          case 'compliance':
            return !constraint.excludedPops?.includes(measurement.popId);

          case 'cost':
            const metrics = this.popMetrics.get(measurement.popId);
            return metrics && metrics.costPerGb <= (constraint.maxCostPerGb || Infinity);

          case 'capacity':
            const popMetrics = this.popMetrics.get(measurement.popId);
            return popMetrics && popMetrics.currentLoad < 0.85;

          default:
            return true;
        }
      });
    });
  }

  updateLatencyMeasurement(
    clientLocation: string,
    measurement: LatencyMeasurement
  ): void {
    const existing = this.latencyCache.get(clientLocation) || [];

    // Keep only recent measurements, max 50 per location
    const updated = [measurement, ...existing]
      .slice(0, 50)
      .filter(m => (Date.now() - m.timestamp) < 300000); // 5 min window

    this.latencyCache.set(clientLocation, updated);
  }

  updatePopMetrics(popId: string, metrics: PopMetrics): void {
    this.popMetrics.set(popId, metrics);
  }
}

This implementation demonstrates several critical patterns for production latency-based routing:

Multi-factor scoring: Latency alone doesn't determine the best route. Load balancing, reliability, and capacity constraints must factor into decisions to prevent cascading failures.

Constraint-based filtering: Data residency and compliance requirements filter eligible PoPs before scoring, ensuring regulatory compliance takes precedence over performance optimization.

Staleness handling: Network measurements have limited validity. The system discards stale data and falls back gracefully when fresh measurements are unavailable.

Capacity awareness: Even the lowest-latency PoP becomes unsuitable if it lacks capacity to serve the request, requiring fallback logic.

Implementing Real-Time Latency Measurement

Accurate latency measurement forms the foundation of effective geo-routing. Modern systems employ multiple measurement strategies:

Active probing from distributed vantage points sends synthetic requests to each PoP every 10-30 seconds. These probes measure TCP handshake time, TLS negotiation, and first-byte latency to build a comprehensive latency map.

Real User Monitoring (RUM) collects actual latency data from production traffic using Navigation Timing API and Resource Timing API in browsers. This provides ground truth about user experience but requires sufficient traffic volume for statistical significance.

BGP route analysis examines autonomous system paths and hop counts to predict latency for undersampled routes. This helps estimate performance for new or low-traffic regions.

Here's an implementation of a distributed latency probe system:

interface ProbeResult {
  sourceRegion: string;
  targetPop: string;
  tcpHandshakeMs: number;
  tlsHandshakeMs: number;
  firstByteMs: number;
  totalMs: number;
  success: boolean;
  timestamp: number;
}

class DistributedLatencyProbe {
  private readonly PROBE_INTERVAL_MS = 15000;
  private readonly TIMEOUT_MS = 5000;
  private probeTargets: Map<string, string>; // popId -> endpoint URL

  constructor(targets: Map<string, string>) {
    this.probeTargets = targets;
  }

  async executeProbe(sourceRegion: string, targetPop: string): Promise<ProbeResult> {
    const endpoint = this.probeTargets.get(targetPop);
    if (!endpoint) {
      throw new Error(`No endpoint configured for PoP: ${targetPop}`);
    }

    const startTime = performance.now();
    const timings: Partial<ProbeResult> = {
      sourceRegion,
      targetPop,
      timestamp: Date.now()
    };

    try {
      const controller = new AbortController();
      const timeoutId = setTimeout(() => controller.abort(), this.TIMEOUT_MS);

      const response = await fetch(endpoint, {
        method: 'HEAD',
        signal: controller.signal,
        cache: 'no-store'
      });

      clearTimeout(timeoutId);

      // Extract timing information from Resource Timing API
      const perfEntries = performance.getEntriesByName(endpoint, 'resource');
      if (perfEntries.length > 0) {
        const timing = perfEntries[perfEntries.length - 1] as PerformanceResourceTiming;

        timings.tcpHandshakeMs = timing.connectEnd - timing.connectStart;
        timings.tlsHandshakeMs = timing.secureConnectionStart > 0 
          ? timing.connectEnd - timing.secureConnectionStart 
          : 0;
        timings.firstByteMs = timing.responseStart - timing.requestStart;
        timings.totalMs = timing.responseEnd - timing.startTime;
      } else {
        // Fallback to simple timing
        timings.totalMs = performance.now() - startTime;
        timings.tcpHandshakeMs = 0;
        timings.tlsHandshakeMs = 0;
        timings.firstByteMs = timings.totalMs;
      }

      timings.success = response.ok;

    } catch (error) {
      timings.success = false;
      timings.totalMs = performance.now() - startTime;
      timings.tcpHandshakeMs = 0;
      timings.tlsHandshakeMs = 0;
      timings.firstByteMs = 0;
    }

    return timings as ProbeResult;
  }

  async probeAllTargets(sourceRegion: string): Promise<ProbeResult[]> {
    const probePromises = Array.from(this.probeTargets.keys()).map(targetPop =>
      this.executeProbe(sourceRegion, targetPop)
    );

    return Promise.all(probePromises);
  }

  startContinuousProbing(
    sourceRegion: string,
    onResults: (results: ProbeResult[]) => void
  ): () => void {
    const intervalId = setInterval(async () => {
      const results = await this.probeAllTargets(sourceRegion);
      onResults(results);
    }, this.PROBE_INTERVAL_MS);

    // Return cleanup function
    return () => clearInterval(intervalId);
  }
}

DNS-Based vs Application-Layer Routing

CDN geo-routing operates at two primary layers, each with distinct trade-offs:

DNS-based routing returns different IP addresses based on the client's DNS resolver location. This approach works with any client, requires no application changes, and leverages existing DNS infrastructure. However, DNS caching limits routing agility, resolver location may not match client location (especially with public DNS services like 8.8.8.8), and TTL values create a trade-off between cache efficiency and routing flexibility.

Application-layer routing uses Anycast IP addresses that route to multiple PoPs, with edge nodes making final routing decisions based on connection characteristics. This enables sub-second routing changes, considers actual client IP rather than resolver location, and allows sophisticated request-level routing based on headers, cookies, or authentication state. The downside is increased complexity and potential for routing loops if not carefully implemented.

Modern architectures often combine both approaches: DNS provides coarse-grained regional routing while application-layer logic handles fine-grained optimization within regions.

Common Pitfalls and Edge Cases

Resolver location mismatch: Public DNS services like Cloudflare (1.1.1.1) and Google (8.8.8.8) route queries to their nearest resolver, not the client's location. Implement EDNS Client Subnet (ECS) support to receive actual client subnet information in DNS queries, enabling accurate geographic routing.

Thundering herd during failover: When a PoP fails health checks, all traffic instantly redirects to backup locations, potentially overwhelming them. Implement gradual traffic shifting with rate limiting and circuit breakers to prevent cascading failures.

Latency measurement bias: Probes from cloud providers' networks experience different latency than end users on residential ISPs. Supplement active probing with RUM data from actual users to validate routing decisions.

Cold cache performance: Routing users to a low-latency PoP with a cold cache may result in worse overall performance than a slightly higher-latency PoP with warm cache. Factor cache hit rates into routing decisions for frequently accessed content.

Cross-region data consistency: Latency-based routing may send subsequent requests from the same user to different PoPs, causing consistency issues for stateful applications. Implement session affinity using consistent hashing or sticky routing for authenticated users.

Cost optimization conflicts: The lowest-latency PoP may have significantly higher egress costs. Implement cost-aware routing that balances latency against bandwidth costs, especially for large file downloads where latency matters less than throughput.

IPv4 vs IPv6 routing asymmetry: IPv6 and IPv4 networks have different topologies and peering relationships. Measure and route them independently rather than assuming equivalent performance.

Best Practices for Production Deployment

Implement multi-tier fallback logic: Define primary, secondary, and tertiary routing strategies. If latency-based routing fails due to stale data, fall back to geographic proximity, then to least-loaded PoP, and finally to a designated default location.

Set appropriate measurement intervals: Balance measurement frequency against overhead. Probe critical paths every 10-15 seconds, secondary paths every 30-60 seconds, and rarely-used paths every 5 minutes.

Use percentile-based routing decisions: Median latency masks variability. Route based on P95 or P99 latency to ensure consistent user experience rather than optimizing for average case.

Implement gradual traffic shifting: When routing decisions change, shift traffic gradually over 5-10 minutes rather than instantly. This prevents cache stampedes and allows monitoring for unexpected issues.

Monitor routing decision quality: Track the correlation between predicted latency (from measurements) and actual latency (from RUM). If correlation drops below 0.7, investigate measurement accuracy or network changes.

Build comprehensive observability: Instrument routing decisions with structured logging including client location, selected PoP, latency score, constraint violations, and fallback triggers. This enables post-incident analysis and continuous optimization.

Test failover scenarios regularly: Simulate PoP failures, network partitions, and measurement system outages in staging environments. Verify that fallback logic works correctly and doesn't create routing loops.

Document routing policies explicitly: Maintain clear documentation of routing constraints, weight configurations, and business rules. This prevents configuration drift and enables informed decision-making during incidents.

Frequently Asked Questions

What is CDN geo-routing and how does it differ from simple load balancing?

CDN geo-routing directs user requests to specific points of presence based on geographic location and network conditions, while load balancing distributes traffic across servers within a single location. Geo-routing optimizes for latency and data locality across global infrastructure, whereas load balancing focuses on capacity distribution within a region.

How does latency-based routing work in 2025 with modern CDN architectures?

Modern latency-based routing combines real-time network measurements, historical performance data, and machine learning models to predict optimal routing paths. Systems measure actual latency from distributed probes every 10-30 seconds, factor in PoP load and capacity, and apply business constraints like data residency requirements before making routing decisions at the DNS or application layer.

What is the best way to measure latency for CDN routing decisions?

Use a hybrid approach combining active probing from distributed vantage points with Real User Monitoring (RUM) data from production traffic. Active probes provide consistent baseline measurements every 15-30 seconds, while RUM validates routing decisions with actual user experience data. Supplement with BGP route analysis for undersampled paths.

When should you avoid latency-based routing in favor of geographic routing?

Avoid latency-based routing when data residency regulations require strict geographic boundaries, when measurement infrastructure is unreliable, or when traffic volume is too low to generate statistically significant latency data. Geographic routing also makes sense for applications where cache hit rate matters more than raw latency, as routing to the nearest PoP maximizes cache efficiency.

How do you handle DNS caching with dynamic latency-based routing?

Set DNS TTL values between 60-300 seconds to balance cache efficiency against routing agility. Implement EDNS Client Subnet to route based on client location rather than resolver location. For applications requiring sub-minute routing changes, use application-layer routing with Anycast instead of relying solely on DNS.

**What are the main challenges when scaling CDN geo

CDN Geo-Routing: Latency-Based Traffic

Why Traditional Geographic Routing Falls Short

Modern Latency-Based Traffic Management Architecture

Implementing Real-Time Latency Measurement

DNS-Based vs Application-Layer Routing

Common Pitfalls and Edge Cases

Best Practices for Production Deployment

Frequently Asked Questions

Comments

More from this blog

Embedding-First Architecture for Real-World LLM Apps

AI/ML Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Command Palette

Why Traditional Geographic Routing Falls Short

Modern Latency-Based Traffic Management Architecture

Implementing Real-Time Latency Measurement

DNS-Based vs Application-Layer Routing

Common Pitfalls and Edge Cases

Best Practices for Production Deployment

Frequently Asked Questions

Comments

More from this blog