Skip to main content

Command Palette

Search for a command to run...

Message Queue Reliability: At-Least-Once Delivery

Published
8 min read
T

Welcome to TopperBlog! 👋

I'm a tech content creator passionate about helping developers level up their careers and master cutting-edge technologies.

🎯 What I Write About: • AI/ML Engineering & LLMs • Web3 & Blockchain Development
• System Design & Architecture • Interview Preparation (FAANG) • Freelancing & Remote Work • Modern Tech Stacks (Next.js, React, Rust, TypeScript) • Performance Optimization & Best Practices

💼 Mission: Sharing practical, actionable insights that accelerate your tech career and maximize your earning potential.

📚 15+ In-Depth Guides covering everything from earning $10k/month as a freelancer to cracking FAANG interviews.

🌐 Let's connect and grow together in this amazing tech journey!

#TechBlogger #SoftwareEngineering #CareerGrowth #WebDevelopment #AIEngineering

Message Queue Reliability: At-Least-Once vs Exactly-Once Delivery Semantics

The Duplicate Message Problem in Modern Distributed Systems

In 2026, distributed systems power everything from real-time payment processing to AI model training pipelines. Yet one fundamental challenge persists: ensuring messages arrive reliably without causing duplicate operations. A payment processed twice, inventory decremented incorrectly, or analytics events counted multiple times—these aren't theoretical problems. They're daily realities that cost businesses millions and erode user trust.

The core issue stems from the inherent unreliability of networks. When a message queue acknowledges receipt of a message, that acknowledgment itself can be lost in transit. The sender doesn't know if the message was processed, leading to retries and potential duplicates. Traditional approaches to solving this problem often created more issues than they solved, forcing developers to choose between reliability and correctness.

Why Traditional Approaches Fall Short

Legacy message queue implementations typically offered only at-least-once delivery guarantees. This meant messages would definitely arrive, but possibly multiple times. Developers were left to implement idempotency manually—a complex, error-prone process that varied across every service.

The common pattern involved maintaining a processed message ID cache, usually in a separate database. This created several problems:

Race conditions: Multiple workers could process the same message simultaneously before either wrote to the deduplication store.

Cache invalidation complexity: How long should you keep processed IDs? Too short and you risk duplicates; too long and you waste resources.

Distributed state management: Keeping the deduplication cache synchronized across regions introduced latency and consistency challenges.

Operational overhead: Every service needed its own deduplication logic, multiplying maintenance burden across teams.

Early exactly-once implementations weren't much better. They often relied on distributed transactions (2PC/3PC protocols) that introduced significant latency penalties and reduced system availability. The CAP theorem's constraints meant that guaranteeing exactly-once delivery often came at the cost of partition tolerance—an unacceptable trade-off for modern cloud-native applications.

Modern TypeScript Solution: Implementing Reliable Message Processing

Today's message queue systems like Google Cloud Pub/Sub, Apache Kafka, and AWS SQS offer sophisticated delivery semantics. Let's implement a production-ready solution using TypeScript that handles both at-least-once and exactly-once patterns correctly.

At-Least-Once with Idempotency

import { PubSub, Message } from '@google-cloud/pubsub';
import { Firestore } from '@google-cloud/firestore';

interface PaymentEvent {
  paymentId: string;
  amount: number;
  currency: string;
  timestamp: number;
}

class IdempotentPaymentProcessor {
  private pubsub: PubSub;
  private firestore: Firestore;
  private processedIds: Map<string, number>;

  constructor() {
    this.pubsub = new PubSub();
    this.firestore = new Firestore();
    this.processedIds = new Map();
  }

  async processMessage(message: Message): Promise<void> {
    const event: PaymentEvent = JSON.parse(message.data.toString());
    const idempotencyKey = `payment:${event.paymentId}`;

    try {
      // Atomic check-and-set using Firestore transaction
      await this.firestore.runTransaction(async (transaction) => {
        const docRef = this.firestore
          .collection('processed_messages')
          .doc(idempotencyKey);

        const doc = await transaction.get(docRef);

        if (doc.exists) {
          console.log(`Payment ${event.paymentId} already processed`);
          return; // Idempotent exit
        }

        // Process payment
        await this.executePayment(event);

        // Mark as processed atomically
        transaction.set(docRef, {
          processedAt: Firestore.FieldValue.serverTimestamp(),
          messageId: message.id,
          paymentId: event.paymentId
        });
      });

      message.ack();
    } catch (error) {
      console.error(`Failed to process payment ${event.paymentId}:`, error);
      message.nack(); // Retry later
    }
  }

  private async executePayment(event: PaymentEvent): Promise<void> {
    // Your payment processing logic here
    console.log(`Processing payment: ${event.paymentId} for ${event.amount}`);
  }
}

Exactly-Once with Transactional Outbox

For true exactly-once semantics, implement the transactional outbox pattern:

import { Kafka, Producer, Consumer, EachMessagePayload } from 'kafkajs';
import { Pool } from 'pg';

interface OrderEvent {
  orderId: string;
  customerId: string;
  items: Array<{ sku: string; quantity: number }>;
}

class ExactlyOnceOrderProcessor {
  private kafka: Kafka;
  private producer: Producer;
  private consumer: Consumer;
  private dbPool: Pool;

  constructor() {
    this.kafka = new Kafka({
      clientId: 'order-processor',
      brokers: ['kafka:9092']
    });

    this.producer = this.kafka.producer({
      transactionalId: 'order-processor-tx',
      maxInFlightRequests: 1,
      idempotent: true
    });

    this.consumer = this.kafka.consumer({
      groupId: 'order-processing-group',
      sessionTimeout: 30000
    });

    this.dbPool = new Pool({
      connectionString: process.env.DATABASE_URL
    });
  }

  async initialize(): Promise<void> {
    await this.producer.connect();
    await this.consumer.connect();
    await this.consumer.subscribe({ 
      topic: 'orders', 
      fromBeginning: false 
    });
  }

  async processOrders(): Promise<void> {
    await this.consumer.run({
      eachMessage: async (payload: EachMessagePayload) => {
        await this.handleOrderMessage(payload);
      }
    });
  }

  private async handleOrderMessage(
    payload: EachMessagePayload
  ): Promise<void> {
    const { message, partition, topic } = payload;
    const order: OrderEvent = JSON.parse(message.value!.toString());

    const client = await this.dbPool.connect();

    try {
      await client.query('BEGIN');

      // Start Kafka transaction
      const transaction = await this.producer.transaction();

      try {
        // Check if already processed (using offset tracking)
        const offsetCheck = await client.query(
          'SELECT offset FROM processed_offsets WHERE topic = $1 AND partition = $2',
          [topic, partition]
        );

        const lastProcessedOffset = offsetCheck.rows[0]?.offset || -1;

        if (parseInt(message.offset) <= lastProcessedOffset) {
          console.log(`Message already processed: ${message.offset}`);
          await client.query('COMMIT');
          return;
        }

        // Process order in database
        await client.query(
          'INSERT INTO orders (order_id, customer_id, status) VALUES ($1, $2, $3)',
          [order.orderId, order.customerId, 'PROCESSING']
        );

        // Update inventory
        for (const item of order.items) {
          await client.query(
            'UPDATE inventory SET quantity = quantity - $1 WHERE sku = $2',
            [item.quantity, item.sku]
          );
        }

        // Produce confirmation event transactionally
        await transaction.send({
          topic: 'order-confirmations',
          messages: [{
            key: order.orderId,
            value: JSON.stringify({
              orderId: order.orderId,
              status: 'CONFIRMED',
              timestamp: Date.now()
            })
          }]
        });

        // Update processed offset
        await client.query(
          `INSERT INTO processed_offsets (topic, partition, offset) 
           VALUES ($1, $2, $3) 
           ON CONFLICT (topic, partition) 
           DO UPDATE SET offset = $3`,
          [topic, partition, parseInt(message.offset)]
        );

        // Commit both transactions atomically
        await transaction.commit();
        await client.query('COMMIT');

      } catch (error) {
        await transaction.abort();
        await client.query('ROLLBACK');
        throw error;
      }

    } catch (error) {
      console.error(`Failed to process order ${order.orderId}:`, error);
      throw error; // Will trigger consumer retry
    } finally {
      client.release();
    }
  }
}

Common Pitfalls and How to Avoid Them

Pitfall 1: Ignoring Message Ordering At-least-once delivery doesn't guarantee order. If message B arrives before message A during a retry, your state could become inconsistent. Solution: Use partition keys to ensure related messages go to the same partition, maintaining order within that partition.

Pitfall 2: Unbounded Deduplication Windows Storing every processed message ID forever is unsustainable. Solution: Implement TTL-based cleanup based on your maximum retry window plus a safety margin.

Pitfall 3: Non-Idempotent Side Effects External API calls, file writes, or third-party service interactions may not be idempotent. Solution: Wrap external calls in their own idempotency layer or use the outbox pattern to defer them.

Pitfall 4: Ignoring Poison Messages Messages that consistently fail processing can block your queue. Solution: Implement dead-letter queues with exponential backoff and alerting.

Pitfall 5: Transaction Timeout Mismatches Database transaction timeouts shorter than message processing time cause partial commits. Solution: Tune timeouts consistently across your stack and implement heartbeat mechanisms.

Best Practices for Production Systems

Choose the Right Semantic for Your Use Case: Financial transactions demand exactly-once; analytics events often work fine with at-least-once plus deduplication at query time.

Implement Comprehensive Monitoring: Track duplicate rates, processing latency, dead-letter queue depth, and transaction abort rates. Set up alerts for anomalies.

Design for Failure: Assume every network call will fail. Implement circuit breakers, timeouts, and graceful degradation.

Test Failure Scenarios: Use chaos engineering to simulate network partitions, slow consumers, and broker failures. Verify your system behaves correctly.

Document Idempotency Keys: Make it clear what constitutes a unique operation. Use composite keys when necessary (e.g., userId:action:timestamp).

Version Your Messages: Include schema versions in messages to handle gradual rollouts and backward compatibility.

Implement Observability: Use distributed tracing to follow messages through your system. Tools like OpenTelemetry make this straightforward.

Frequently Asked Questions

Q: When should I use at-least-once vs exactly-once delivery? A: Use at-least-once with idempotent handlers for high-throughput, low-latency scenarios where occasional duplicates are acceptable (logs, metrics, caching). Use exactly-once for financial transactions, inventory management, or any operation where duplicates cause correctness issues.

Q: Does exactly-once delivery impact performance? A: Yes, typically 20-40% throughput reduction due to transactional overhead. However, modern implementations like Kafka's exactly-once semantics (EOS) minimize this impact through optimizations like transaction batching.

Q: How long should I store processed message IDs? A: Set retention to your maximum message retry window plus 2-3x safety margin. For most systems, 7-14 days is sufficient. Use TTL indexes in your database for automatic cleanup.

Q: Can I achieve exactly-once across different message queue systems? A: Not natively. Exactly-once guarantees are typically scoped to a single system. For cross-system exactly-once, implement the saga pattern with compensating transactions.

Q: What happens if my idempotency check fails? A: Treat it as a transient failure and retry. If failures persist, route to a dead-letter queue for investigation. Never silently drop messages.

Q: How do I handle message ordering with multiple consumers? A: Use partition keys to route related messages to the same partition, which is consumed by a single consumer instance. Within a partition, ordering is guaranteed.

Q: Should I implement idempotency at the application or infrastructure level? A: Both. Infrastructure-level (message queue) exactly-once prevents duplicate delivery. Application-level idempotency provides defense-in-depth and handles cases where infrastructure guarantees fail.

Conclusion

Message queue reliability isn't about choosing between at-least-once and exactly-once delivery—it's about understanding the trade-offs and implementing the right pattern for each use case. At-least-once with proper idempotency handles most scenarios efficiently, while exactly-once semantics provide strong guarantees when correctness is paramount.

Modern message queue systems and patterns like transactional outbox have made exactly-once delivery practical for production systems. The TypeScript implementations shown here provide a foundation you can adapt to your specific requirements.

Remember: reliability is a spectrum, not a binary choice. Start with at-least-once delivery and idempotent handlers. Graduate to exactly-once only when business requirements justify the additional complexity. Monitor everything, test failure scenarios rigorously, and always design for the inevitable network partition.

The distributed systems landscape of 2026 demands both high reliability and high performance. By mastering these delivery semantics and implementing them thoughtfully, you'll build systems that are both correct and efficient—the hallmark of excellent engineering.


Metadata

SEO Title: Message Queue Reliability: At-Least-Once vs Exactly-Once Guide

Meta Description: Learn how to implement at-least-once and exactly-once message delivery in distributed systems. TypeScript examples, best practices, and production patterns for 2026.

Primary Keyword: message queue reliability

Secondary Keywords:

  • at-least-once delivery
  • exactly-once semantics
  • idempotent message processing
  • distributed systems reliability
  • message deduplication patterns
  • transactional outbox pattern
  • Kafka exactly-once
  • message queue best practices

Tags:

  • distributed-systems
  • message-queues
  • typescript
  • reliability-engineering
  • kafka
  • pubsub
  • system-design

Word Count: 1,789 words