Apache Kafka Performance Tuning

Kafka - Performance Tuning

Test Environment Setup

Follow the instructions from our earlier blog post ‘Setting up a Kafka test environment’
https://docs.google.com/document/d/1h10XYQhaNSYvFvQvla4N2KIlIM9kLNNAO6HUvgOTSIA/edit?tab=t.0

Operating System

Apache Kafka runs on the Java Virtual Machine (JVM) and thus supports all major operating systems. We have tested this environment on a macOS 15.3.2 Sequoia on Apple Silicon. We recommend using the Windows Subsystem for Linux (WSL) on Windows.

Performance Tuning - Producer Side

The bin/kafka-producer-perf-test.sh is a performance testing script included in Apache Kafka installations. It’s designed to benchmark and evaluate the performance of Kafka producers.

The script accepts several parameters to customize the test. Below are the most commonly used ones

Parameter	Description	Example Value
–topic	The Kafka topic to send messages to	test-topic
–num-records	The total number of messages to send	1000000
–record-size	The size of each message (in bytes)	100
–throughput	The target throughput (messages per second). Use -1 for no limit	1000 or -1
–producer-props	Kafka producer configuration properties (e.g., bootstrap servers, acks)	bootstrap.servers=localhost:9092 acks=1
–producer.config	Path to a properties file containing producer configurations	producer.properties
–print-metrics	If specified, prints detailed producer metrics after the test	(Flag, no value needed)
–compression-type	Compression codec for messages (e.g., none, gzip, snappy, lz4, zstd).	gzip
–payload-file	Path to a file containing message payloads (instead of random data)	/path/to/payload.txt
–linger-ms	Time to buffer data before sending (producer property, can be passed via –producer-props)	5
–batch-size	Batch size for producer (producer property, can be passed via –producer-props).	16384

Parameter

Description

Example Value

–topic

The Kafka topic to send messages to

test-topic

–num-records

The total number of messages to send

1000000

–record-size

The size of each message (in bytes)

100

–throughput

The target throughput (messages per second). Use -1 for no limit

1000 or -1

–producer-props

Kafka producer configuration properties (e.g., bootstrap servers, acks)

bootstrap.servers=localhost:9092 acks=1

–producer.config

Path to a properties file containing producer configurations

producer.properties

–print-metrics

If specified, prints detailed producer metrics after the test

(Flag, no value needed)

–compression-type

Compression codec for messages (e.g., none, gzip, snappy, lz4, zstd).

gzip

–payload-file

Path to a file containing message payloads (instead of random data)

/path/to/payload.txt

–linger-ms

Time to buffer data before sending (producer property, can be passed via –producer-props)

–batch-size

Batch size for producer (producer property, can be passed via –producer-props).

16384

Acks

When working with Apache Kafka, one configuration that often gets attention is the “acks” (acknowledgment) setting. While it might seem like a tuning knob for performance, in most cases, this isn’t a decision you should be constantly weighing. The recommended setting is acks=all, which ensures maximum data durability by requiring acknowledgments from all in-sync replicas. Unless you have a very specific reason and understand the trade-offs, deviating from this default is generally not advised.

To test, we create a simple topic with 3 partitions with the replication factor of 3

$ kafka-topics.sh --create --topic test-ack --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

Now test for different ack values. This can be passed in the –producer-props parameter like below

$ kafka-producer-perf-test.sh \
--topic test-ack \
--num-records 100000 \
--record-size 100 \
--throughput -1 \
--producer-props bootstrap.servers=localhost:9092 acks=0

There are three possible values and here are the results of the testing:

acks=0: “Fire and forget” - the producer doesn’t wait for any acknowledgment
100000 records sent, 378787.9 records/sec (36.12 MB/sec), 1.42 ms avg latency, 140.00 ms max latency, 1 ms 50th, 6 ms 95th, 10 ms 99th, 11 ms 99.9th.

acks=1: The producer waits for acknowledgment from the leader broker only
100000 records sent, 369003.7 records/sec (35.19 MB/sec), 7.71 ms avg latency, 127.00 ms max latency, 7 ms 50th, 17 ms 95th, 20 ms 99th, 21 ms 99.9th.

acks=all (or acks=-1): The producer waits for acknowledgment from the leader and all in-sync replicas
100000 records sent, 234192.0 records/sec (22.33 MB/sec), 142.01 ms avg latency, 237.00 ms max latency, 139 ms 50th, 178 ms 95th, 183 ms 99th, 203 ms 99.9th.

Performance Impact: Higher acks settings require more confirmation steps before a producer can proceed, which increases latency and reduces throughput. However, these additional steps provide stronger guarantees about message persistence.
Reliability Impact: Lower acks settings provide better performance but at the risk of potential message loss during broker failures. Higher acks settings ensure that your data is safely replicated before acknowledging the producer.

⚠️ Heads-up: Again, unless you know exactly what you’re doing, don’t touch the acks setting (default: acks=all). There are far better places in your system to tune for performance like your code, batching, compression, and parallelism etc. but not here

Batch Size

In Kafka, producers batch multiple records together before sending them to the broker.

Here is how we can play with batch size

$ kafka-producer-perf-test.sh \
--topic test-ack \
--num-records 100000 \
--record-size 100 \
--throughput -1 \
--producer-props bootstrap.servers=localhost:9092 \
acks=all \
batch.size=1024

Here are the test results for different batch sizes

Small Batch (1KB / batch.size=1024)
100000 records sent, 24003.8 records/sec (2.29 MB/sec), 2114.60 ms avg latency, 3894.00 ms max latency, 2147 ms 50th, 3683 ms 95th, 3848 ms 99th, 3889 ms 99.9th.

Medium Batch (16KB / batch.size=16384)
100000 records sent, 256410.3 records/sec (24.45 MB/sec), 34.48 ms avg latency, 136.00 ms max latency, 23 ms 50th, 77 ms 95th, 79 ms 99th, 81 ms 99.9th.

Medium Batch (64KB / batch.size=65536)
100000 records sent, 314465.4 records/sec (29.99 MB/sec), 4.84 ms avg latency, 163.00 ms max latency, 3 ms 50th, 16 ms 95th, 19 ms 99th, 20 ms 99.9th.

Large Batch (1MB / batch.size=1048576)
100000 records sent, 347222.2 records/sec (33.11 MB/sec), 12.69 ms avg latency, 155.00 ms max latency, 13 ms 50th, 20 ms 95th, 22 ms 99th, 23 ms 99.9th.

Larger batch = better throughput, fewer requests over the network.

Normally, Kafka sends messages as soon as a batch is full (based on batch.size). But if the batch isn’t full and linger.ms=0 (default), Kafka sends messages right away resulting in more small batches.
By increasing linger.ms, you let the producer pause briefly to collect more messages

Large Batch (1MB / batch.size=1048576) with linger.ms=1000

100000 records sent, 341296.9 records/sec (32.55 MB/sec), 22.94 ms avg latency, 179.00 ms max latency, 22 ms 50th, 37 ms 95th, 42 ms 99th, 48 ms 99.9th.

Number or Producers

More producers generally = more throughput up to a point.

But too many producers can lead to:

Broker CPU pressure
Disk I/O contention
Higher latency

Create a simple topic with 3 partitions with the replication factor of 3

$ kafka-topics.sh --create --topic test-many-producers --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

Create a bash script called ‘start_producers.sh’ with the following script
https://gist.github.com/arockianirmal26/aac27c3e1fa08c733eb7f2b7afbaf221

Call the script to start just one producer
$ ./start_producers.sh 1

The detailed log is saved in the file ‘producer_1.log’. Now call the script to start three producers in parallel. The detailed log is saved in the file ‘producer_3.log’. Below are the observations

One Producer
Producer 1:
100000 records sent, 346020.8 records/sec (33.00 MB/sec), 6.19 ms avg latency, 128.00 ms max latency, 5 ms 50th, 13 ms 95th, 19 ms 99th, 20 ms 99.9th.

Three Producers
Producer 1:
33333 records sent, 120335.7 records/sec (11.48 MB/sec), 17.87 ms avg latency, 136.00 ms max latency, 16 ms 50th, 33 ms 95th, 36 ms 99th, 38 ms 99.9th.
Producer 2:
33333 records sent, 118202.1 records/sec (11.27 MB/sec), 22.60 ms avg latency, 152.00 ms max latency, 27 ms 50th, 39 ms 95th, 41 ms 99th, 42 ms 99.9th.
Producer 3:
33333 records sent, 118622.8 records/sec (11.31 MB/sec), 21.70 ms avg latency, 149.00 ms max latency, 24 ms 50th, 37 ms 95th, 39 ms 99th, 40 ms 99.9th.

Kafka throughput can scale with an increased number of producers. But usually only when they’re distributed across multiple machines. Running many producers on the same host often leads to diminishing returns due to shared resource limitations

Compression

Choosing the right compression method in Apache Kafka can significantly impact performance, latency, and throughput. Compression type can be added to the producer props like below

$ kafka-producer-perf-test.sh \
--topic my-topic \
--num-records 100000 \
--record-size 100 \
--throughput -1 \
--producer-props bootstrap.servers=localhost:9092 acks=all batch.size=16384 compression.type=snappy

Here are our test results

None
100000 records sent, 357142.9 records/sec (34.06 MB/sec), 12.62 ms avg latency, 135.00 ms max latency, 12 ms 50th, 18 ms 95th, 21 ms 99th, 22 ms 99.9th.

Snappy
100000 records sent, 231481.5 records/sec (22.08 MB/sec), 17.59 ms avg latency, 279.00 ms max latency, 19 ms 50th, 27 ms 95th, 28 ms 99th, 29 ms 99.9th.

Gzip
100000 records sent, 235849.1 records/sec (22.49 MB/sec), 2.01 ms avg latency, 134.00 ms max latency, 2 ms 50th, 4 ms 95th, 15 ms 99th, 17 ms 99.9th.

LZ4
100000 records sent, 215517.2 records/sec (20.55 MB/sec), 8.25 ms avg latency, 322.00 ms max latency, 9 ms 50th, 14 ms 95th, 19 ms 99th, 24 ms 99.9th.

Zstd
100000 records sent, 184501.8 records/sec (17.60 MB/sec), 2.30 ms avg latency, 351.00 ms max latency, 2 ms 50th, 6 ms 95th, 15 ms 99th, 18 ms 99.9th.

No Compression (default): This setting can be acceptable if your data is poorly compressible (e.g., already compacted logs or highly random payloads). However, in most cases, enabling compression is beneficial, as it reduces network usage and improves throughput, especially when data can be compressed efficiently.
Snappy: Best for low-latency needs with moderate compression. Suitable for real-time applications where speed is critical but some bandwidth savings are desired.
Gzip: Gzip provides the best compression ratio and excellent latency, but it’s CPU-intensive. Ideal for low-bandwidth environments or when message size is critical
LZ4: LZ4 offers better compression than Snappy and faster decompression, making it a strong choice when both speed and size matter.
Zstd: Optimal for cases prioritizing storage efficiency and low latency over throughput. Perfect for large datasets where bandwidth is a bottleneck.

⚠️ Heads-up: When in doubt, use LZ4. It’s the best option for general-purpose Kafka performance

Transactions

When using Kafka transactions, the number of messages committed per transaction significantly impacts overall throughput and latency. Our benchmarks demonstrate that increasing batch size from individual messages to batches of 50-100 messages can improve throughput while reducing per-message overhead.
However, larger batches introduce higher latency for individual transaction commits and increase the risk of data loss in failure scenarios. Finding the optimal transaction batch size requires balancing throughput requirements against latency constraints for your specific use case.

Create a topic

$ kafka-topics.sh --create --topic transaction-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

Then we create a python script ‘tx_benchmark.py’ with the following content. When we call the script, we pass the number of messages to be produced within a single transaction and total number of messages to be produced overall into the specified topic as parameters.
We use confluent_kafka for this.

$ pip3 install confluent-kafka

https://gist.github.com/arockianirmal26/f146504b3c04df7093b0bb10b6abd83e

Without Transactions
$ python tx_benchmark.py --total-messages 1000

NON-TRANSACTIONAL BENCHMARK RESULTS

Total messages sent: 1000
Total data sent: 0.98 MB
Total duration: 0.01 seconds
Throughput: 86534.02 messages/second
Data rate: 84.75 MB/second

1 Message per Transactions
$ python tx_benchmark.py --msgs-per-tx 1 --total-messages 1000

TRANSACTIONAL BENCHMARK RESULTS

Messages per transaction: 1
Total messages sent: 1000
Total data sent: 0.98 MB
Total transactions: 1000
Total duration: 35.61 seconds
Throughput: 28.09 messages/second
Data rate: 0.03 MB/second
Transaction rate: 28.09 transactions/second
Average transaction time: 35.59 ms
Median transaction time: 31.66 ms
Min transaction time: 26.46 ms
Max transaction time: 1022.19 ms

50 Messages per Transactions
$ python tx_benchmark.py --msgs-per-tx 50 --total-messages 1000

TRANSACTIONAL BENCHMARK RESULTS

Messages per transaction: 50
Total messages sent: 1000
Total data sent: 0.98 MB
Total transactions: 20
Total duration: 1.69 seconds
Throughput: 591.09 messages/second
Data rate: 0.58 MB/second
Transaction rate: 11.82 transactions/second
Average transaction time: 84.53 ms
Median transaction time: 35.05 ms
Min transaction time: 30.26 ms
Max transaction time: 1002.07 ms

100 Messages per Transactions
$ python tx_benchmark.py --msgs-per-tx 100 --total-messages 1000

TRANSACTIONAL BENCHMARK RESULTS

Messages per transaction: 100
Total messages sent: 1000
Total data sent: 0.98 MB
Total transactions: 10
Total duration: 0.85 seconds
Throughput: 1180.07 messages/second
Data rate: 1.16 MB/second
Transaction rate: 11.80 transactions/second
Average transaction time: 84.62 ms
Median transaction time: 36.36 ms
Min transaction time: 31.45 ms
Max transaction time: 507.50 ms

Different Message Formats

Kafka is payload-agnostic, allowing you to send virtually any data format. Our benchmark compares five common serialization formats

JSON: Human-readable and flexible, ideal for debugging and interoperability. Best for when schema evolution and readability are priorities, but less efficient for high-throughput applications.

CSV: Simple and compact text format with low overhead. Perfect for tabular data and legacy system integration, though lacks nested structure support.

Avro: Schema-based binary format offering strong typing and compression. Excellent for high-throughput systems requiring strict schemas, backwards compatibility, and reduced network usage.

Binary: Raw binary data with minimal overhead. Optimal for maximum throughput when working with binary data like images or when implementing custom serialization.

String: Plain text format with no specific structure. Useful for simple messages or when consumers need text processing capabilities.

Create a topic for benchmarking

$ kafka-topics.sh --create --topic payload-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

Install the confluent avro

$ pip3 install confluent-kafka avro

Then we create a python script ‘payload_benchmark.py’ with the following content. When we call the script, we pass the payload type
https://gist.github.com/arockianirmal26/8afde90d1ac478c0e9369d5d09b8a654

JSON payloads
$ python payload_benchmark.py --payload-type json

KAFKA PAYLOAD BENCHMARK RESULTS

Payload type: json
Messages per batch: 100
Total messages sent: 10000
Total batches: 100
Average message size: 1028.19 bytes
Total duration: 2.71 seconds
Throughput: 3688.98 messages/second
Data throughput: 3.62 MB/second
Batch rate: 36.89 batches/second
Average batch time: 27.10 ms
Min batch time: 23.99 ms
Max batch time: 58.06 ms

CSV payloads
$ python payload_benchmark.py --payload-type csv

KAFKA PAYLOAD BENCHMARK RESULTS

Payload type: csv
Messages per batch: 100
Total messages sent: 10000
Total batches: 100
Average message size: 1024.00 bytes
Total duration: 0.90 seconds
Throughput: 11105.94 messages/second
Data throughput: 10.85 MB/second
Batch rate: 111.06 batches/second
Average batch time: 9.00 ms
Min batch time: 6.42 ms
Max batch time: 57.37 ms

Avro payloads
$ python payload_benchmark.py --payload-type avro

KAFKA PAYLOAD BENCHMARK RESULTS

Payload type: avro
Messages per batch: 100
Total messages sent: 10000
Total batches: 100
Average message size: 1025.00 bytes
Total duration: 0.99 seconds
Throughput: 10124.23 messages/second
Data throughput: 9.90 MB/second
Batch rate: 101.24 batches/second
Average batch time: 9.87 ms
Min batch time: 7.76 ms
Max batch time: 32.98 ms

Binary payloads
$ python payload_benchmark.py --payload-type binary

KAFKA PAYLOAD BENCHMARK RESULTS

Payload type: binary
Messages per batch: 100
Total messages sent: 10000
Total batches: 100
Average message size: 1024.00 bytes
Total duration: 0.65 seconds
Throughput: 15319.07 messages/second
Data throughput: 14.96 MB/second
Batch rate: 153.19 batches/second
Average batch time: 6.52 ms
Min batch time: 4.78 ms
Max batch time: 14.10 ms

String payloads
$ python payload_benchmark.py --payload-type string

KAFKA PAYLOAD BENCHMARK RESULTS

Payload type: string
Messages per batch: 100
Total messages sent: 10000
Total batches: 100
Average message size: 1024.00 bytes
Total duration: 0.69 seconds
Throughput: 14598.03 messages/second
Data throughput: 14.26 MB/second
Batch rate: 145.98 batches/second
Average batch time: 6.85 ms
Min batch time: 5.00 ms
Max batch time: 10.76 ms

Performance Tuning - Consumer Side

The bin/kafka-consumer-perf-test.sh script in a Kafka installation is a command-line tool provided by Apache Kafka to evaluate the performance of a Kafka consumer. It simulates consuming messages from a Kafka topic, allowing users to measure throughput, latency, and other performance metrics under various configurations.

The script accepts several parameters to customize the test. Below are the most commonly used ones

Parameter	Description	Example Value
–topic	The Kafka topic to consume messages from	test-topic
–messages	The total number of messages to consume	1000000
–bootstrap-server	Kafka broker(s) to connect to (comma-separated list)	localhost:9092
–group	Consumer group ID for the consumer	test-consumer-group
–consumer.config	Path to a properties file containing consumer configurations	consumer.properties
–print-metrics	If specified, prints detailed consumer metrics after the test	(Flag, no value needed)
–from-latest	Start consuming from the latest offset (overrides default behavior).	(Flag, no value needed)
–timeout	Maximum time (in ms) to wait for messages before exiting	30000

Parameter

Description

Example Value

–topic

The Kafka topic to consume messages from

test-topic

–messages

The total number of messages to consume

1000000

–bootstrap-server

Kafka broker(s) to connect to (comma-separated list)

localhost:9092

–group

Consumer group ID for the consumer

test-consumer-group

–consumer.config

Path to a properties file containing consumer configurations

consumer.properties

–print-metrics

If specified, prints detailed consumer metrics after the test

(Flag, no value needed)

–from-latest

Start consuming from the latest offset (overrides default behavior).

(Flag, no value needed)

–timeout

Maximum time (in ms) to wait for messages before exiting

30000

Number of Consumers

Increasing the number of consumers in a Kafka consumer group can improve throughput, but its impact depends on your setup. Consider you need consume from a topic with three partitions

If you have 1 consumer → it will read from all 3 partitions (serially or with multiple threads).

If you have 2 consumers in a single group → they will split the 3 partitions (1 will read 2 partitions, the other 1).

If you have 3 consumers → perfect one-to-one mapping (each consumer gets one partition).

If you have 4+ consumers in the same group → some consumers will be idle (because there are only 3 partitions).

More consumers → more parallelism → better throughput and lower lag until you hit the partition limit.

To test this we create a topic, produce messages to it and consume from a single consumer

$ kafka-topics.sh --create --topic consumer-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

$ kafka-producer-perf-test.sh \
--topic consumer-topic \
--num-records 300000 \
--record-size 100 \
--throughput -1 \
--producer-props "bootstrap.servers=localhost:9092"

$ kafka-consumer-perf-test.sh \
--bootstrap-server localhost:9092 \
--topic consumer-topic \
--messages 300000 \
--group perf-test-group-1 \
--timeout 30000

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 15:51:13:442, 2025-04-13 15:51:16:770, 28.6102, 8.5968, 300000, 90144.2308, 3159, 169, 169.2913, 1775147.9290

Now create a script called ‘run-consumers.sh’ script with the following script.
https://gist.github.com/arockianirmal26/1657633fca872508600b886b59c83fb4

Now start 3 consumers with the following command
$ ./run-consumers.sh 3

Logs are in the logs files (consumer_1.log, consumer_2.log, consumer_2.log)

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 15:32:56:559, 2025-04-13 15:33:12:807, 0.3105, 0.0191, 3256, 200.3939, 6172, 10076, 0.0308, 323.1441

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 15:32:56:559, 2025-04-13 15:33:12:812, 0.3044, 0.0187, 3192, 196.3945, 6174, 10079, 0.0302, 316.6981

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 15:32:56:559, 2025-04-13 15:33:12:804, 0.3387, 0.0209, 3552, 218.6519, 6174, 10071, 0.0336, 352.6959

So here in theory, 1 consumer handled 300K much faster than 3 consumers combined. Often I received timeout error ‘Exiting before consuming the expected number of messages: timeout (120000 ms) exceeded. You can use the –timeout option to increase the timeout.’ even though I tested with increased timeouts. Since I am running all the consumers locally, CPU or Disk IO contention might be slowing down the consumers.

The strength of Kafka is that you can have several brokers in your cluster and each one can be the leader of some partitions. Then each consumer will be connected to a few different brokers so each consumer can use each broker’s resources.

fetch.max.wait.ms/fetch.min.bytes {#fetch.max.wait.ms/fetch.min.bytes}

The fetch.max.wait.ms and fetch.min.bytes properties in a Kafka consumer control how it retrieves messages from brokers, balancing latency and throughput.

fetch.max.wait.ms: The maximum time (in milliseconds) the consumer waits for new data before returning a response from the broker. Default is 500ms. A higher value reduces the number of requests but increases latency if data arrives slowly.

fetch.min.bytes: The minimum amount of data (in bytes) the broker must have before responding to a fetch request. Default is 1 byte. Setting a higher value improves throughput by batching more data but may delay responses if data is sparse.

To play around we create a new topic and produce messages into it
$ kafka-topics.sh --create --topic test-fetch-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

$ kafka-producer-perf-test.sh \
--topic test-fetch-topic \
--num-records 300000 \
--record-size 100 \
--throughput -1 \
--producer-props "bootstrap.servers=localhost:9092"

Now we can consume the messages with single consumer with the default fetch.max.wait.ms (500ms) and fetch.min.bytes (1 byte) values

$ kafka-consumer-perf-test.sh \
--bootstrap-server localhost:9092 \
--topic test-fetch-topic \
--messages 300000 \
--group test-fetch-group \
--timeout 30000

Create a file ‘consumer-config.properties’ with following content

fetch.min.bytes=1
fetch.max.wait.ms=100

Before consuming the messages we need to reset the offset

$ kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--group test-fetch-group \
--topic test-fetch-topic \
--reset-offsets \
--to-earliest \
--execute

Tested for the following different combinations and here are the results

$ kafka-consumer-perf-test.sh \
--bootstrap-server localhost:9092 \
--topic test-fetch-topic \
--messages 300000 \
--group test-fetch-group \
--consumer.config consumer-config.properties \
--print-metrics \
--timeout 30000

Default
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 19:22:44:593, 2025-04-13 19:22:47:940, 28.6102, 8.5480, 300000, 89632.5067, 3166, 181, 158.0676, 1657458.5635

fetch.min.bytes=1 / fetch.max.wait.ms=100
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 19:32:06:908, 2025-04-13 19:32:10:236, 28.6102, 8.5968, 300000, 90144.2308, 3160, 168, 170.2990, 1785714.2857

fetch.min.bytes=65536 / fetch.max.wait.ms=1000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 19:35:26:508, 2025-04-13 19:35:29:861, 28.6102, 8.5327, 300000, 89472.1145, 3171, 182, 157.1991, 1648351.6484

fetch.min.bytes=1048576 / fetch.max.wait.ms=1000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2025-04-13 19:37:22:217, 2025-04-13 19:37:26:647, 28.6102, 6.4583, 300000, 67720.0903, 3204, 1226, 23.3362, 244698.2055

Setting	Messages/sec	Fetch Time	Good For
Default	~89K	181ms	General purpose
min=1KB / max=100	~90K	168ms	Low-latency / High-frequency
min=64KB / max=1000	~89.5K	182ms	Slight batching, no major gain
min=1MB / max=1000	~67K	1226ms	Max throughput, increased latency

Setting

Messages/sec

Fetch Time

Good For

Default

~89K

181ms

General purpose

min=1KB / max=100

~90K

168ms

Low-latency / High-frequency

min=64KB / max=1000

~89.5K

182ms

Slight batching, no major gain

min=1MB / max=1000

~67K

1226ms

Max throughput, increased latency

Processing Time

The –sleep-ms parameter sets a delay (in milliseconds) after each message is consumed, which simulates the time it would take for a consumer to process a message.
This simulates the processing time or the time it takes to process the message before the consumer can poll and consume the next message.

Create a new topic and produce messages into it

$ kafka-topics.sh --create --topic process-messages-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 3

$ kafka-producer-perf-test.sh \
--topic process-messages-topic \
--num-records 300000 \
--record-size 100 \
--throughput -1 \
--producer-props "bootstrap.servers=localhost:9092"

To consume the messages from the topic with sleep-ms (processing time) we create a consumer using ‘Kafka-python’. We need to install kafka python if it is not installed.

$ pip3 install kafka-python

Then we create a python script ‘consumer_simulated_delay.py’ with the following content
https://gist.github.com/arockianirmal26/ecf6bf951c375d2fb1b6fa600bebdfe3

Now execute the script for different ‘SIMULATED_PROCESSING_TIME’ also remember to reset the offset for the consumer group before every run

$ python consumer_simulated_delay.py

Here are the results

SIMULATED_PROCESSING_TIME’ = 0.001 # 1ms (Simple processing)
Starting consumer with simulated processing time: 1.0ms
Messages processed: 1000
Total time: 4.93s
Average processing time: 1.27ms
Throughput: 202.67 messages/second

SIMULATED_PROCESSING_TIME’ = 0.01 # 10ms (fast processing)
Starting consumer with simulated processing time: 10.0ms
Messages processed: 1000
Total time: 15.36s
Average processing time: 11.82ms
Throughput: 65.12 messages/second

SIMULATED_PROCESSING_TIME’ = 0.1 # 100ms (normal processing)
Starting consumer with simulated processing time: 100.0ms
Messages processed: 1000
Total time: 106.60s
Average processing time: 103.02ms
Throughput: 9.38 messages/second

SIMULATED_PROCESSING_TIME’ = 1 second (slow processing)
Starting consumer with simulated processing time: 1000ms
Messages processed: 500
Total time: 508.19s
Average processing time: 1003.25ms
Throughput: 0.98 messages/second

Reset the offset before every run

$ kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--group process-messages-test-group \
--topic process-messages-topic \
--reset-offsets \
--to-earliest \
--execute

About Arockia Nirmal Amala Doss
Arockia Nirmal Amala Doss is an experienced Data Engineer with over 10 years of expertise in data architectures. As a certified Apache Kafka Developer and AWS Data Engineer, he specializes in scalable event streaming pipelines and cloud-native data solutions. His clients from the DAX environment value his practical experience with Amazon MSK, ETL/ELT processes, and developing robust data infrastructures.

Continue reading

By Anatoly Zelenin

Kroxylicious in Action: Data Validation and End-to-End Encryption for Apache Kafka

Garbage In, Garbage Out is a real problem in Kafka. Consumers crash when producers send garbage. And what about data security in the cloud? In this article, I'll show you how to use the Kafka proxy Kroxylicious to validate and encrypt your data – without touching your application code.

By Anatoly Zelenin

Practical Guide: Stream Data from Kafka to OpenSearch in Kubernetes

In this article, I'll show you how to automatically send data from a Kafka topic to OpenSearch. We'll use Kafka Connect and Strimzi in Kubernetes.

Apache Kafka Performance Tuning

Kafka - Performance Tuning

Test Environment Setup

Operating System

Performance Tuning - Producer Side

Acks

Batch Size

Number or Producers

Compression

Transactions

NON-TRANSACTIONAL BENCHMARK RESULTS

TRANSACTIONAL BENCHMARK RESULTS

TRANSACTIONAL BENCHMARK RESULTS

TRANSACTIONAL BENCHMARK RESULTS

Different Message Formats

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

Performance Tuning - Consumer Side

Number of Consumers

fetch.max.wait.ms/fetch.min.bytes {#fetch.max.wait.ms/fetch.min.bytes}

Processing Time

Continue reading

Kroxylicious in Action: Data Validation and End-to-End Encryption for Apache Kafka

Practical Guide: Stream Data from Kafka to OpenSearch in Kubernetes

Contact

Address

E-Mail Address

Links

Apache Kafka Performance Tuning

Kafka - Performance Tuning

Test Environment Setup

Operating System

Performance Tuning - Producer Side

Acks

Batch Size

Number or Producers

Compression

Transactions

NON-TRANSACTIONAL BENCHMARK RESULTS

TRANSACTIONAL BENCHMARK RESULTS

TRANSACTIONAL BENCHMARK RESULTS

TRANSACTIONAL BENCHMARK RESULTS

Different Message Formats

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

KAFKA PAYLOAD BENCHMARK RESULTS

Performance Tuning - Consumer Side

Number of Consumers

fetch.max.wait.ms/fetch.min.bytes {#fetch.max.wait.ms/fetch.min.bytes}

Processing Time

Share this post

Continue reading

Kroxylicious in Action: Data Validation and End-to-End Encryption for Apache Kafka

Practical Guide: Stream Data from Kafka to OpenSearch in Kubernetes

Contact

Address

E-Mail Address

Links