Benchmarking Apache Kafka on OpenStack VM's

Blog article:

I have been wanting to test Apache Kafka for sometime now and finally got around to it! In this blog post I give a very short introduction on what is Kafka, installation & configuration of Kafka cluster and finally benchmarking few near real-world scenarios on OpenStack VM's

What is Apache Kafka?

Apache Kafka is a distributed publish-subscribe message broker. The core abstraction in Kafka is the topic. A topic is nothing but an append log to which producers publish their records and consumers read from it. A Kafka topic is divided into multiple partitions, these partitions are distributed across servers with in a Kafka cluster, which makes it scalable and each partition is replicated across a configurable number of servers for fault tolerance and messages in the topic are retained for a specific period of time for durability. The whole thing looks like below

kafka_arc

You can read more about Kafka in the official documentation

Installation and Configuration

The installation is quite straightforward, you can have one broker per server (recommended) or multiple brokers per server. You also need zookeeper for the coordination of the Kafka cluster, the good news is the Zookeeper is bundled with Kafka package along with handy startup and shutdown scripts. It is also recommended to install Zookeeper outside the Kafka cluster.

Zookeeper Installation

# download and unpack the software
mkdir /software
cd /software
curl -O http://www-eu.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
tar xvfz kafka_2.11-0.10.0.1.tgz

# change essential configuration in config/zookeeper.properties
dataDir - data directory for zookeeper

# start zookeeper
cd $KAFKA_HOME
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

Kafka Installation

# download and unpack the software
mkdir /software
cd /software
curl -O http://www-eu.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
tar xvfz kafka_2.11-0.10.0.1.tgz

# change essential configuration config/server.properties
zookeeper.connect - points to the zookeeper; e.g host:2181
broker.id - broker id must be unique across kafka cluster; e.g 101
log.dirs - location to store the topics; e.g /kdata/logs
log.retention.hours - topics retention time

# start kafka
cd $KAFKA_HOME
$KAFKA_HOME/bin/kafka-server-start.sh -daemon config/server.properties

As you have seen its fairly easy to setup a Kafka cluster

Benchmarking Kafka on OpenStack VM's

Now that all the infra work is out of the way, let's try to run some tests to demonstrate (or more likely to understand) how Kafka works and behaves!

Test setup

Kafka cluster is running on three VM's; the spec of each VM is 4 VCPUs, 7.3 GB RAM, 100GB volume with ext4 filesystem and 10Gb ethernet

Zookeeper running on separate VM; the spec of this VM is 2 VCPUs, 3.7 GB RAM, 100GB volume with ext4 filesystem and 10Gb ethernet

I have ran the following dd commands to determine the disk throughput

[root@habench103 kafka_2.11-0.10.0.1]# dd if=/dev/zero of=/kdata/test.dd bs=100M count=100 oflag=dsync
 100+0 records in
 100+0 records out
 10485760000 bytes (10 GB) copied, 144.381 s, 72.6 MB/s
 

[root@habench103 kafka_2.11-0.10.0.1]# dd if=/dev/zero of=/kdata/test.dd bs=1G count=10 oflag=dsync
 10+0 records in
 10+0 records out
 10737418240 bytes (11 GB) copied, 138.556 s, 77.5 MB/s

I have also tested with perf and it confirms that VM's have 10Gb ethernet. The following tests are run during a normal working day and repeated several times to remove any outliers.

Test1: Single producer publishing 500 byte messages with no replication

The objective of this test is to understand the scalability of Kafka cluster. I start with 1 node Kafka cluster and then add additional nodes one by one.

# create kafka topic
bin/kafka-topics.sh --zookeeper habench001:2181 --create --topic test --partitions 48 --replication-factor 1

# run the producer to publish events to Kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic test --num-records 50000000 --record-size 500 \ 
--throughput -1 --producer-props acks=1 bootstrap.servers=habench102:9092 buffer.memory=104857600 batch.size=9000

kafka1

It is evident that Kafka throughput scales by adding additional nodes

Test 2: Single producer publishing 1000 byte messages with no replication

This test is repetition of the above with a different message size to understand the impact of the message size. Again I start with 1 node Kafka cluster and then add additional nodes one by one

# create kafka topic
bin/kafka-topics.sh --zookeeper habench001:2181 --create --topic test --partitions 48 --replication-factor 1

# run the producer to publish events to Kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic test --num-records 50000000 --record-size 1000 \
--throughput -1 --producer-props acks=1 bootstrap.servers=habench102:9092 buffer.memory=104857600 batch.size=9000

kafka2

Again the throughput scales by adding additional nodes and larger message sizes result in higher throughput than smaller sizes which is understandable due to overhead of processing more messages.

Test 3: Single producer publishing 500 byte messages with (3x) and with out replication

The objective of this test is to understand the cost of the replication

# create kafka topic (with replication)
bin/kafka-topics.sh --zookeeper habench001:2181 --create --topic testr3 --partitions 48 --replication-factor 3

# publish messages to kafka topic with required settings
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 30000000 --record-size 500 \
--throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

kafka3

There seems to be cost involved in replicating the topic partitions, roughly about 20-25% percent. Although not bad it's worth noting that replication is not for free!

Test 4: Three producers, 3x async replication with different message sizes

The objective of this test is to understand the effect of the message size on the producer throughput

# publish 200 byte messages to kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 30000000 --record-size 200 \
--throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

# publish 500 byte messages to kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 15000000 --record-size 500 \
--throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

# publish 1000 byte messages to kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 10000000 --record-size 1000 \
--throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

kafka4

As can be seen from the above chart the throughput dropped down significantly when message size is 1000 bytes. Although this is not expected, I got the same results even after repeating the tests few times.

Test 5: Throughput with single and multiple producers

The objective of this test is see the benefits of multiple producers even when a single producer is not limited by network bandwidth.

500 byte message size and 3x async replication with single and multiple producers

kafka5

In our case, since we have 10Gb ethernet, the network throughput is not a bottleneck for single producer, however producer throughput can also be increased by adding more producers.

Test 6: Producers and Consumers in harmony


Finally in the real production deployments it is quite natural to have producers and consumers writing and reading data at the same time.

bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 20000000 --record-size 500 \
--throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

bin/kafka-consumer-perf-test.sh --zookeeper habench001:2181 --messages 20000000 --topic testr3 --threads 1

The result of running a combination of producers and consumers is as below

	One Producer and One Consumer	Three Producers and Three Consumers
Throughput	124.47 MB/sec	155 MB/sec
Messages/sec	261038	330009
Consumer throughput	83.71 MB/sec	364 MB/sec

Learnings & Conclusion

Number of topic partitions dearly impacts both producer and consumer throughputs.
The message size and batch size should also be looked at, although not much can be done for the message size, the batch size should be tuned inline with the latency requirements to achieve maximum throughput.
The producers will need in the order of couple of GB memory and the CPUs on the kafka cluster were fairly quiet.
num.io.threads on the kafka broker can be tuned to further optimize the throughput and this again depends on the storage you have.
The hardware requirements for the Kafka cluster are not very demanding and it seems to scale very well.

Tags

Kafka

BigData

Hadoop

Benchmarking Apache Kafka on OpenStack VM's

What is Apache Kafka?

Installation and Configuration

Zookeeper Installation

Kafka Installation

Benchmarking Kafka on OpenStack VM's

Test setup

Test1: Single producer publishing 500 byte messages with no replication

Test 2: Single producer publishing 1000 byte messages with no replication

Test 3: Single producer publishing 500 byte messages with (3x) and with out replication

Test 4: Three producers, 3x async replication with different message sizes

Test 5: Throughput with single and multiple producers

Test 6: Producers and Consumers in harmony

Learnings & Conclusion

Add new comment

Disclaimer

Blogroll

CERN Accelerating science

Benchmarking Apache Kafka on OpenStack VM's

What is Apache Kafka?

Installation and Configuration

Zookeeper Installation

Kafka Installation

Benchmarking Kafka on OpenStack VM's

Test setup

Test1: Single producer publishing 500 byte messages with no replication

Test 2: Single producer publishing 1000 byte messages with no replication

Test 3: Single producer publishing 500 byte messages with (3x) and with out replication

Test 4: Three producers, 3x async replication with different message sizes

Test 5: Throughput with single and multiple producers

Test 6: Producers and Consumers in harmony

Learnings & Conclusion

Add new comment

Disclaimer

Blogroll