In distributed systems, we must always assume that individual components can fail.
In general, it is not possible to send a message exactly once across an unreliable network.
At-most-once delivery is easy to achieve. In Kafka, set acks=0 on the producer.
The producer sends the message and does not wait for confirmation. If the message arrives, it arrives once. If it does not, there is no retry. So it is delivered at most once.
At-least-once delivery is also straightforward. Set acks=all on the producer. The producer sends a message and waits for the leader’s acknowledgment. Before acknowledging, the leader waits until all in-sync replicas have received the message. If the producer gets the acknowledgment, the record is durably replicated under reasonable assumptions.
Set min.insync.replicas=2 to avoid writes being acknowledged by only one surviving broker. If an acknowledgment fails, the producer retries, which can create duplicates.
Exactly-once is harder. We cannot just "turn on" perfect delivery in distributed systems. Still, on the producer side, acks=all, min.insync.replicas=2, and enable.idempotence=true provide exactly-once write semantics for a running producer session.
How does this work? The producer sends a sequence number with each record. It starts at 0 and increments for every message. If the leader receives sequence 2 before sequence 1, it detects a gap and temporarily rejects 2. The producer then retries 1 and 2. Once records arrive in order, they are acknowledged after replication.
What if the producer does not receive the ack? It retries. Since the leader has already processed that sequence number, it drops the duplicate and only acknowledges it.
So idempotence gives you duplicate-free, in-order writes for one producer session, as long as the producer stays alive and does not hit delivery.timeout.ms.
Is this already end-to-end exactly-once? Not yet.
Idempotence solves duplicate writes by one producer. For read-process-write pipelines, you also need transactions.