Creating a Kafka Cluster in Ubuntu 18.04
This tutorial is designed for the people having basic knowledge in kafka, ubuntu and programming skills. You can read more about kafka on the official site https://kafka.apache.org.
In this article, we will discuss how to set up a Kafka cluster with 3 brokers on a single node.
Kafka is a distributed messaging system based on the principle of the pub-sub (publish-subscribe) model. It allows us to publish and subscribe to a stream of records that can be categorized. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. It is an incredibly fast, highly scalable, fault-tolerant system, and it’s designed to process large amounts of data in real time.
Advantages :
· Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
· Scalability − Kafka messaging system scales easily without down time.
· Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable.
· Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored.
The basic architecture components of Kafka is as follows:
Zookeeper: a coordinator between brokers and clusters.
Topic: a topic is a category to which messages are published by the message producers.
Brokers: broker instance can handle reads and writes message.
Producers: that insert data into the cluster.
Consumers: that read data from the cluster
Installing Apache Kafka
Apache Kafka requires Java to be installed on your Ubuntu 18.04 machine. Firstly, let us update the OS by the following command:
$ sudo apt-get update
After the OS is updated, go ahead to install Java:
Verify the Java version?
Kafka needs a java runtime environment
java --version
If Java is not currently installed, you’ll see the following output:
Command ‘java’ not found, but can be installed with:
Installing Java
apt install default-jre
apt install openjdk-11-jre-headless
apt install openjdk-8-jre-headless
Downloading Kafka
Next, you must download the Kafka source to your Ubuntu 18.04. It’s highly recommended to download it from the official website of Apache Kafka: You can also select any nearby mirror to download.
wget http://www-us.apache.org/dist/kafka/2.7.0/kafka_2.13-2.7.0.tgz
Then extract the file by the following command
tar -xzf kafka_2.13–2.7.0.tgz
Let us create a new folder with name kafka in /usr/local directory: /usr/local/kafka.
Then move the extract of Kafka to /usr/local/kafka directory:
sudo mv kafka_2.13–2.7.0 /usr/local/kafka
List kafka script and config files
Start Zookeeper Service
Zookeeper is a key value store used to maintain server state. This is mandatory to run the kafka. It’s a centralized system for maintaining the configuration. It also does a job to elect the leaders.
Go to Kafka directory and start the Zookeeper Service using below command.
However before you can confirm the zookeeper.properties file for any changes in dataDir?
nano config/zookeeper.properties
The Zookeeper, by default, will listen on *:2181/tcp.
Start the Zookeeper Service using below command
sudo bin/zookeeper-server-start.sh config/zookeeper.properties
If you want to stop Zookeeper server, then run below command: -
bin/zookeeper-server-stop.sh
Again, start the zookeeper service.
Once Zookeeper service started successfully, you can now go ahead and start Kafka Service.
Setting Up a Kafka Cluster
To setup multiple brokers on a single node, different server properties files are required for each broker. Each property file defines different values for the following properties:
1.broker.id
2.port
3. log.dir
As we will have 3 brokers, we will create properties file for each broker. Let’s copy the /config/server.properties file and create 3 files for each instance.
cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
config/server.properties:broker.id=0
listeners=PLAINTEXT://:9092
log.dirs=/tmp/kafka-logs
Edit the config for both the properties file as follows
config/server-1.properties:broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs-1
The broker.id should be the unique and name of the node in the cluster. We must change the port and log directory.
nano config/server-1.properties
Edit the config for server-2 as follow
config/server-2.properties:broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs-2
The broker.id should be the unique and name of the node in the cluster. We must change the port and log directory.
nano config/server-2.properties
Starting the 3 brokers in the cluster
Now, we can start our brokers. Run these three commands on different terminal sessions.
Start the Kafka default Server
sudo bin/kafka-server-start.sh config/server.properties
Start the Kafka Server-1
sudo bin/kafka-server-start.sh config/server-1.properties
Start the Kafka Server-2
sudo bin/kafka-server-start.sh config/server-2.properties
Creating a Kafka topic
Kafka stores and organizes messages as a collection. They are known as Topics. A topic is then divided into partitions, where each contains a subset of a topic’s messages. A broker can have multiple partitions. Why are there multiple partitions for a topic? Primarily it is to increase throughput; parallel access to the topic can occur.
Let’s create a topic with replication factor 3 & partitions 1 since we have 3 Kafka brokers running. We will name the topic as AshishJadhavsClusterTopic
Partition allows how many brokers you want data to be split. As, we have 3 brokers, we can set it up-to 3.
Replication factor allows how many copies of data you need. This is helpful when any broker down other brokers can handle the job.
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic AshishJadhavsclusterTopic
Let’s verify the topic created
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic AshishJadhavsclusterTopic
Let’s create topic with 3 replication factor and 2 partitions since we have 3 kafka brokers running and name the topic as AshishJadhavsclusterPartionTopic.
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic AshishJadhavsclusterPartionTopic
Let’s verify the topic created
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic AshishJadhavsclusterPartionTopic
Starting a producer for sending messages
A producer is an entity/application that publishes data to a Kafka cluster, which is made up of brokers. A producer can publish to multiple topics. You can define what your topics are and which topics a producer publishes to. Broker is responsible for receiving and storing the data when a producer publishes.
Let’s Push a message with producer to the AshishJadhavsClusterPartionTopic we created. Producer feeds the data into the kafka clusters. This command will publish the data into the cluster.
Note: broker-list options have the list of brokers which we have created.
create a producer-
bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093,localhost:9094 --topic AshishJadhavsclusterPartionTopicorbin/kafka-console-producer.sh --broker-list localhost:9092 --topic AshishJadhavsclusterPartionTopic
Starting a consumer for consuming messages
Applications that need to read data from Kafka use a KafkaConsumer
to subscribe to Kafka topics and receive messages from these topics.
Let’s pull the messages with consumer from the Topic- AshishJadhavsClusterPartionTopic which we have published in the previous step.
We will create 3 consumers pulling messages from the topic which are published on kafka brokers
consumer #1 Run this command to consume the messages.
Note: bootstrap-server is the broker which we have created. It could be any from our 3 brokers.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic AshishJadhavsclusterPartionTopic --from-beginning
consumer #2 Run this command to consume the messages.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9093 --topic AshishJadhavsclusterPartionTopic --from-beginning
consumer #3 Run this command to consume the messages.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9094 --topic AshishJadhavsclusterPartionTopic --from-beginning
Or Consumer client consumes messages$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic AshishJadhavsclusterPartionTopic
Hope the steps mentioned in the article helps you to setup and configure kafka cluster on Ubuntu 18.04. Please experiment and share the feedback.
Thank you for reading Ashish Jadhav