The producer’s job is to send data to Kafka. Your primary concern should be ensuring that data is delivered reliably, with no loss and no duplicates while maintaining good performance and reducing resource consumption.
These two settings are non-negotiable for any production application where data integrity is crucial.
-
acks=all
: (is the default since ~Kafka 3.4) Ensures that the producer waits until the message has been written to all in-sync replicas.
-
enable.idempotence=true
: (is the default since ~Kafka 3.4) Ensures that messages are delivered exactly once. Survives even Broker-restarts.
enable.idempotence=true
guarantees you, that the messages produced by a producer will be delivered exactly once and in the correct order. It does not guarantee that a consumer will process the message exactly once. For that, you need to use transactions.
If you simply retry after the timeout in your own code, it would be better if you simply increase the delivery.timeout.ms
. Read more about reliability in this article about reliable producing.
Throughput Tuning
For most "medium data" applications, you can get a significant performance boost by simply batching your messages without sacrificing (too much) latency.
-
batch.size= e.g. 900000
: (Default: 16KiB
) We believe that this value is too small for most use cases. Increase it to up to 1MB. This allows your producer to send more data in a single network request, reducing overhead and improving throughput.
-
linger.ms=5, 10 or 100
: (Default: 0ms
) This setting works hand-in-hand with batch.size
. It tells the producer to wait up to a certain amount of time for a batch to fill up before sending it. Increasing this to a few milliseconds can significantly improve throughput without a major impact on latency.
-
compression.type=lz4 or zstd
: (Default: none) Compressing your messages before they are sent to Kafka can save network bandwidth and improve throughput.
Additional Settings
-
transactional.id={hostname}
: This setting is only needed if you are using producer transactions. The transactional.id
must be unique to each producer instance but should be persistent over restarts. When a producer instance restarts, Kafka uses this ID to prevent "zombie" producers from committing transactions. In Kubernetes, this is a key reason to use StatefulSets for your producers, as they provide a stable identity.
In Kubernetes you should use StatefulSets for your transactional producers.
-
partitioner=murmur2_random
: (Default: Murmur2Partitioner
in Java, consistent_random
in librdkafka) The partitioner determines which partition a message is sent to. If you are using both Java and non-Java producers, please ensure that all producers have the partitioner set to murmur2_random
to ensure consistent partitioning.
If you are using both Java and non-Java producers, please ensure that all non-Java producers have the partitioner set to murmur2_random
to ensure consistent partitioning.