Creating a Kafka Cluster in Ubuntu 18.04

This tutorial is designed for the people having basic knowledge in kafka, ubuntu and programming skills. You can read more about kafka on the official site
In this article, we will discuss how to set up a Kafka cluster with 3 brokers on a single node.

Kafka is a distributed messaging system based on the principle of the pub-sub (publish-subscribe) model. It allows us to publish and subscribe to a stream of records that can be categorized. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. It is an incredibly fast, highly scalable, fault-tolerant system, and it’s designed to process large amounts of data in real time.

Advantages :
· Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
· Scalability − Kafka messaging system scales easily without down time.
· Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable.
· Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored.

The basic architecture components of Kafka is as follows:
Zookeeper: a coordinator between brokers and clusters.
Topic: a topic is a category to which messages are published by the message producers.
Brokers: broker instance can handle reads and writes message.
Producers: that insert data into the cluster.
Consumers: that read data from the cluster

Installing Apache Kafka

Apache Kafka requires Java to be installed on your Ubuntu 18.04 machine. Firstly, let us update the OS by the following command:

update the OS
update the OS

After the OS is updated, go ahead to install Java:
Verify the Java version?
Kafka needs a java runtime environment

If Java is not currently installed, you’ll see the following output:
Command ‘java’ not found, but can be installed with:

Installing Java

Downloading Kafka
Next, you must download the Kafka source to your Ubuntu 18.04. It’s highly recommended to download it from the official website of Apache Kafka: You can also select any nearby mirror to download.

Then extract the file by the following command
tar -xzf kafka_2.13–2.7.0.tgz
Let us create a new folder with name kafka in /usr/local directory: /usr/local/kafka.
Then move the extract of Kafka to /usr/local/kafka directory:

Move the kafka extract files
Move the kafka extract files

List kafka script and config files

Start Zookeeper Service
Zookeeper is a key value store used to maintain server state. This is mandatory to run the kafka. It’s a centralized system for maintaining the configuration. It also does a job to elect the leaders.
Go to Kafka directory and start the Zookeeper Service using below command.
However before you can confirm the file for any changes in dataDir?

zookeeper properties
zookeeper properties

The Zookeeper, by default, will listen on *:2181/tcp.

Start the Zookeeper Service using below command

If you want to stop Zookeeper server, then run below command: -

Again, start the zookeeper service.

Once Zookeeper service started successfully, you can now go ahead and start Kafka Service.

Setting Up a Kafka Cluster
To setup multiple brokers on a single node, different server properties files are required for each broker. Each property file defines different values for the following properties:
3. log.dir

As we will have 3 brokers, we will create properties file for each broker. Let’s copy the /config/ file and create 3 files for each instance.

Edit the config for both the properties file as follows

The should be the unique and name of the node in the cluster. We must change the port and log directory.

Edit the config for server-2 as follow

The should be the unique and name of the node in the cluster. We must change the port and log directory.

Starting the 3 brokers in the cluster
Now, we can start our brokers. Run these three commands on different terminal sessions.

Start the Kafka default Server

Start the Kafka Server-1

Start the Kafka Server-2

Creating a Kafka topic
Kafka stores and organizes messages as a collection. They are known as Topics. A topic is then divided into partitions, where each contains a subset of a topic’s messages. A broker can have multiple partitions. Why are there multiple partitions for a topic? Primarily it is to increase throughput; parallel access to the topic can occur.
Let’s create a topic with replication factor 3 & partitions 1 since we have 3 Kafka brokers running. We will name the topic as AshishJadhavsClusterTopic

Partition allows how many brokers you want data to be split. As, we have 3 brokers, we can set it up-to 3.
Replication factor allows how many copies of data you need. This is helpful when any broker down other brokers can handle the job.

Let’s verify the topic created

Let’s create topic with 3 replication factor and 2 partitions since we have 3 kafka brokers running and name the topic as AshishJadhavsclusterPartionTopic.

Let’s verify the topic created

Starting a producer for sending messages
A producer is an entity/application that publishes data to a Kafka cluster, which is made up of brokers. A producer can publish to multiple topics. You can define what your topics are and which topics a producer publishes to. Broker is responsible for receiving and storing the data when a producer publishes.
Let’s Push a message with producer to the AshishJadhavsClusterPartionTopic we created. Producer feeds the data into the kafka clusters. This command will publish the data into the cluster.
Note: broker-list options have the list of brokers which we have created.

create a producer-

orbin/ --broker-list localhost:9092 --topic AshishJadhavsclusterPartionTopic

Starting a consumer for consuming messages
Applications that need to read data from Kafka use a KafkaConsumer to subscribe to Kafka topics and receive messages from these topics.
Let’s pull the messages with consumer from the Topic- AshishJadhavsClusterPartionTopic which we have published in the previous step.
We will create 3 consumers pulling messages from the topic which are published on kafka brokers

consumer #1 Run this command to consume the messages.
Note: bootstrap-server is the broker which we have created. It could be any from our 3 brokers.

consumer #2 Run this command to consume the messages.

consumer #3 Run this command to consume the messages.

$ bin/ --zookeeper localhost:2181 --from-beginning --topic AshishJadhavsclusterPartionTopic

Hope the steps mentioned in the article helps you to setup and configure kafka cluster on Ubuntu 18.04. Please experiment and share the feedback.

Thank you for reading Ashish Jadhav

Ashish is well known open source technology leader with expertise into Microservices, Blockchain & DevOps. He is Speaker & Blogger on open source technologies.