Essential Kafka Client Configurations for Production

Kafka's defaults are fine for development, but they're a liability in production. This guide covers the essential configurations to ensure reliability, performance, and operational stability in real-world applications.

By Anatoly Zelenin

Our Recommendations for the Producer Configuration

The producer’s job is to send data to Kafka. Your primary concern should be ensuring that data is delivered reliably, with no loss and no duplicates while maintaining good performance and reducing resource consumption.

These two settings are non-negotiable for any production application where data integrity is crucial.

acks=all: (is the default since ~Kafka 3.4) Ensures that the producer waits until the message has been written to all in-sync replicas.
enable.idempotence=true: (is the default since ~Kafka 3.4) Ensures that messages are delivered exactly once. Survives even Broker-restarts.

enable.idempotence=true guarantees you, that the messages produced by a producer will be delivered exactly once and in the correct order. It does not guarantee that a consumer will process the message exactly once. For that, you need to use transactions.

delivery.timeout.ms: (default: 120000 (2 minutes)) The producer will retry to send the message for this amount of time. After this timeout, the producer will give up and throw an error.

If you simply retry after the timeout in your own code, it would be better if you simply increase the delivery.timeout.ms. Read more about reliability in this article about reliable producing.

client.id={hostname}: (default: auto-generated) Set it for better monitoring and logging. If you are using Kubernetes, set it to the hostname of the pod.

Throughput Tuning

For most "medium data" applications, you can get a significant performance boost by simply batching your messages without sacrificing (too much) latency.

batch.size= e.g. 900000: (Default: 16KiB) We believe that this value is too small for most use cases. Increase it to up to 1MB. This allows your producer to send more data in a single network request, reducing overhead and improving throughput.
linger.ms=5, 10 or 100: (Default: 0ms) This setting works hand-in-hand with batch.size. It tells the producer to wait up to a certain amount of time for a batch to fill up before sending it. Increasing this to a few milliseconds can significantly improve throughput without a major impact on latency.
compression.type=lz4 or zstd: (Default: none) Compressing your messages before they are sent to Kafka can save network bandwidth and improve throughput.

Additional Settings

transactional.id={hostname}: This setting is only needed if you are using producer transactions. The transactional.id must be unique to each producer instance but should be persistent over restarts. When a producer instance restarts, Kafka uses this ID to prevent "zombie" producers from committing transactions. In Kubernetes, this is a key reason to use StatefulSets for your producers, as they provide a stable identity.

In Kubernetes you should use StatefulSets for your transactional producers.

partitioner=murmur2_random: (Default: Murmur2Partitioner in Java, consistent_random in librdkafka) The partitioner determines which partition a message is sent to. If you are using both Java and non-Java producers, please ensure that all producers have the partitioner set to murmur2_random to ensure consistent partitioning.

If you are using both Java and non-Java producers, please ensure that all non-Java producers have the partitioner set to murmur2_random to ensure consistent partitioning.

Consumer Configuration

A consumer’s job is to read and process data from Kafka topics. Proper configuration is key to building a stable, resilient, and manageable application.

group.id={service-name}: All consumers that belong to the same group.id will work together as a single unit to consume messages from a topic. Use a clear, descriptive name for your consumer group, such as the name of your service.
client.id={hostname}: Similar to the producer, giving your consumer a meaningful ID is essential for monitoring and debugging.
group.protocol=consumer: (default: classic) Use the Next Generation of the Consumer Rebalance Protocol (See KIP-848 for more information). Requires Kafka 4.0 or higher. Makes rebalancing much faster!

If you are using a Kafka older than 4.0 you cannot use the new consumer rebalance protocol. Then you will have to use the old classic protocol. But to improve rebalancing speed, you can set the partition.assignment.strategy=CooperativeStickyAssignor: (default: RangeAssignor, CooperativeStickyAssignor)

isolation.level=read_committed: (default: read_uncommitted) Makes sure that the your consumer code never sees messages from uncommitted or aborted transactions. It’s a best practice to set this on all consumers by default, even if you are not currently using transactions.

If your producer is transactional and your consumer is configured with isolation.level=read_uncommitted, it can lead to severe data integrity issues by reading data from transactions that have failed. Always use read_committed.

auto.offset.reset=earliest: This setting dictates what to do if a consumer starts up and can’t find a valid offset for a topic partition. If you want your consumers to read all messages from the beginning, set it to earliest. If you do not want to read all messages and are ok with missing some, set it to latest.
enable.auto.commit=true: (default: true) Handles automatic offset commits. Disable this only if you know exactly what you are doing (like when handling transactions manually).
commit.interval=5000: (default: 5000) This setting determines how often a consumer’s offset is committed to Kafka. Its default is 5 seconds, which means that in the event of a crash, at most 5 seconds of data will be reprocessed. You should not change this unless you have a very specific reason to do so.

If you need exactly once processing, you need to either use a consume-process-produce loop with transactions or use Kafka Streams with the setting processing.guarantee=exactly_once_v2.

Avoiding Rebalances in Kubernetes and Improving Cost-Efficiency

Rebalances happen when consumers join or leave a consumer group. In automated infrastructures like Kubernetes failures of pods are handled by the infrastructure and we can avoid expensive rebalances in Kafka.

group.instance.id={hostname}: Makes sure that a consumer "remembers" its previous assignment after a restart and thus, avoids expensive rebalances. But: The restart must survive the session.timeout.ms.
session.timeout.ms= e.g. 60000: (default: 30000) This setting defines the maximum time a consumer can be unavailable before it is considered failed and the consumer group is rebalanced. Increase this to a value where a consumer can easily detect its failure, shut down, and restart the pod. But only together with group.instance.id.

group.instance.id and session.timeout.ms should be used together and make only sense if you are using a StatefulSet for your consumers in Kubernetes such that the identity of the pod survives the restart.

client.rack={dc-name}: If you are running your Kafka cluster and your clients in a multi-availability zone (AZ) cloud environment, you can set the client.rack on both the brokers and the consumers. This allows the consumer to prefer reading from a replica in the same AZ, which can significantly reduce inter-AZ data transfer costs.

Topic Configuration

A topic’s configuration is the blueprint for how Kafka handles your data. The correct settings here are fundamental to the reliability and performance of your entire system.

Reliability and Performance

num partitions=12: (default: 1) 12 is the magic number for the partition count. Read this article for more information.
replication-factor=3: (default: 1) The number of times a partition’s data is replicated across brokers. 3 is the gold standard for most production environments, providing high availability at a reasonable cost.
min.insync.replicas=2: (default: 1) How many in-sync replicas must be available for producers to write to a partition. With a replication factor of 3, setting this to 2 means, that one broker can fail without affecting production and another one can fail without loosing data.

Data Retention and Lifecycle

cleanup.policy: (default: delete) This defines how Kafka handles old messages. The default is delete, which means messages are deleted after a set period. For use cases like change data capture or event sourcing, you might use compact, which removes old messages for the same key.
retention.hours: (default: 168 hours (7 days)) This setting controls how long Kafka retains messages before they are deleted. The default is 7 days, but you should adjust this based on your business requirements.

Still not sure?

Then write to us or book an introductory meeting with us and we will find the right answer together.

Say hello

By Anatoly Zelenin

Kafka in Banking: A Bridge Between Worlds – for Long-term Economical Projects

Core banking systems handle the most important processes in banking. The problem: These inflexible giants rarely harmonize with the wishes of today's customers. Systems are needed that connect the old with the new. Many banks rely on Apache Kafka for this. Why?

By Anatoly Zelenin

How many partitions do I need in Apache Kafka?

Apache Kafka's partitions are essential for maintaining order and structure in all data processing workflows. But how many partitions should you set up? This is an important question that should be addressed early in your system design. This blog post tackles this critical consideration for your Kafka implementation.

Essential Kafka Client Configurations for Production

Our Recommendations for the Producer Configuration

Throughput Tuning

Additional Settings

Consumer Configuration

Avoiding Rebalances in Kubernetes and Improving Cost-Efficiency

Topic Configuration

Reliability and Performance

Data Retention and Lifecycle

Still not sure?

Conclusion

Continue reading

Kafka in Banking: A Bridge Between Worlds – for Long-term Economical Projects

How many partitions do I need in Apache Kafka?

Contact

Address

E-Mail Address

Links

Essential Kafka Client Configurations for Production

Our Recommendations for the Producer Configuration

Throughput Tuning

Additional Settings

Consumer Configuration

Avoiding Rebalances in Kubernetes and Improving Cost-Efficiency

Topic Configuration

Reliability and Performance

Data Retention and Lifecycle

Still not sure?

Conclusion

Share this post

Continue reading

Kafka in Banking: A Bridge Between Worlds – for Long-term Economical Projects

How many partitions do I need in Apache Kafka?

Contact

Address

E-Mail Address

Links