Creating a Kafka Cluster in Ubuntu 18.04

Ashish Jadhav
8 min readApr 6, 2021

--

This tutorial is designed for the people having basic knowledge in kafka, ubuntu and programming skills. You can read more about kafka on the official site https://kafka.apache.org.
In this article, we will discuss how to set up a Kafka cluster with 3 brokers on a single node.

Kafka is a distributed messaging system based on the principle of the pub-sub (publish-subscribe) model. It allows us to publish and subscribe to a stream of records that can be categorized. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. It is an incredibly fast, highly scalable, fault-tolerant system, and it’s designed to process large amounts of data in real time.

Advantages :
· Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
· Scalability − Kafka messaging system scales easily without down time.
· Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable.
· Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored.

The basic architecture components of Kafka is as follows:
Zookeeper: a coordinator between brokers and clusters.
Topic: a topic is a category to which messages are published by the message producers.
Brokers: broker instance can handle reads and writes message.
Producers: that insert data into the cluster.
Consumers: that read data from the cluster

Installing Apache Kafka

Apache Kafka requires Java to be installed on your Ubuntu 18.04 machine. Firstly, let us update the OS by the following command:

$ sudo apt-get update
update the OS

After the OS is updated, go ahead to install Java:
Verify the Java version?
Kafka needs a java runtime environment

java --version

If Java is not currently installed, you’ll see the following output:
Command ‘java’ not found, but can be installed with:

Installing Java

apt install default-jre
apt install openjdk-11-jre-headless
apt install openjdk-8-jre-headless

Downloading Kafka
Next, you must download the Kafka source to your Ubuntu 18.04. It’s highly recommended to download it from the official website of Apache Kafka: You can also select any nearby mirror to download.

wget http://www-us.apache.org/dist/kafka/2.7.0/kafka_2.13-2.7.0.tgz

Then extract the file by the following command
tar -xzf kafka_2.13–2.7.0.tgz
Let us create a new folder with name kafka in /usr/local directory: /usr/local/kafka.
Then move the extract of Kafka to /usr/local/kafka directory:

sudo mv kafka_2.13–2.7.0 /usr/local/kafka
Move the kafka extract files

List kafka script and config files

Start Zookeeper Service
Zookeeper is a key value store used to maintain server state. This is mandatory to run the kafka. It’s a centralized system for maintaining the configuration. It also does a job to elect the leaders.
Go to Kafka directory and start the Zookeeper Service using below command.
However before you can confirm the zookeeper.properties file for any changes in dataDir?

nano config/zookeeper.properties
zookeeper properties

The Zookeeper, by default, will listen on *:2181/tcp.

Start the Zookeeper Service using below command

sudo bin/zookeeper-server-start.sh config/zookeeper.properties

If you want to stop Zookeeper server, then run below command: -

bin/zookeeper-server-stop.sh

Again, start the zookeeper service.

Once Zookeeper service started successfully, you can now go ahead and start Kafka Service.

Setting Up a Kafka Cluster
To setup multiple brokers on a single node, different server properties files are required for each broker. Each property file defines different values for the following properties:
1.broker.id
2.port
3. log.dir

As we will have 3 brokers, we will create properties file for each broker. Let’s copy the /config/server.properties file and create 3 files for each instance.

cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
config/server.properties:broker.id=0
listeners=PLAINTEXT://:9092
log.dirs=/tmp/kafka-logs

Edit the config for both the properties file as follows

config/server-1.properties:broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs-1

The broker.id should be the unique and name of the node in the cluster. We must change the port and log directory.

nano config/server-1.properties

Edit the config for server-2 as follow

config/server-2.properties:broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs-2

The broker.id should be the unique and name of the node in the cluster. We must change the port and log directory.

nano config/server-2.properties

Starting the 3 brokers in the cluster
Now, we can start our brokers. Run these three commands on different terminal sessions.

Start the Kafka default Server

sudo bin/kafka-server-start.sh config/server.properties

Start the Kafka Server-1

sudo bin/kafka-server-start.sh config/server-1.properties

Start the Kafka Server-2

sudo bin/kafka-server-start.sh config/server-2.properties

Creating a Kafka topic
Kafka stores and organizes messages as a collection. They are known as Topics. A topic is then divided into partitions, where each contains a subset of a topic’s messages. A broker can have multiple partitions. Why are there multiple partitions for a topic? Primarily it is to increase throughput; parallel access to the topic can occur.
Let’s create a topic with replication factor 3 & partitions 1 since we have 3 Kafka brokers running. We will name the topic as AshishJadhavsClusterTopic

Partition allows how many brokers you want data to be split. As, we have 3 brokers, we can set it up-to 3.
Replication factor allows how many copies of data you need. This is helpful when any broker down other brokers can handle the job.

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic AshishJadhavsclusterTopic

Let’s verify the topic created

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic AshishJadhavsclusterTopic

Let’s create topic with 3 replication factor and 2 partitions since we have 3 kafka brokers running and name the topic as AshishJadhavsclusterPartionTopic.

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic AshishJadhavsclusterPartionTopic

Let’s verify the topic created

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic AshishJadhavsclusterPartionTopic

Starting a producer for sending messages
A producer is an entity/application that publishes data to a Kafka cluster, which is made up of brokers. A producer can publish to multiple topics. You can define what your topics are and which topics a producer publishes to. Broker is responsible for receiving and storing the data when a producer publishes.
Let’s Push a message with producer to the AshishJadhavsClusterPartionTopic we created. Producer feeds the data into the kafka clusters. This command will publish the data into the cluster.
Note: broker-list options have the list of brokers which we have created.

create a producer-

bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093,localhost:9094 --topic AshishJadhavsclusterPartionTopicorbin/kafka-console-producer.sh --broker-list localhost:9092 --topic AshishJadhavsclusterPartionTopic

Starting a consumer for consuming messages
Applications that need to read data from Kafka use a KafkaConsumer to subscribe to Kafka topics and receive messages from these topics.
Let’s pull the messages with consumer from the Topic- AshishJadhavsClusterPartionTopic which we have published in the previous step.
We will create 3 consumers pulling messages from the topic which are published on kafka brokers

consumer #1 Run this command to consume the messages.
Note: bootstrap-server is the broker which we have created. It could be any from our 3 brokers.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic AshishJadhavsclusterPartionTopic --from-beginning

consumer #2 Run this command to consume the messages.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9093 --topic AshishJadhavsclusterPartionTopic --from-beginning

consumer #3 Run this command to consume the messages.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9094 --topic AshishJadhavsclusterPartionTopic --from-beginning
Or Consumer client consumes messages$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic AshishJadhavsclusterPartionTopic

Hope the steps mentioned in the article helps you to setup and configure kafka cluster on Ubuntu 18.04. Please experiment and share the feedback.

Thank you for reading Ashish Jadhav

--

--

Ashish Jadhav
Ashish Jadhav

Written by Ashish Jadhav

Ashish is well known open source technology leader with expertise into Microservices, Blockchain & DevOps. He is Speaker & Blogger on open source technologies.