Kafka Service - How It Can Be Used?

former_member662960 · ‎11-03-2020

Introduction

In this blog post I will talk about Kafka and how you can configure it in your local, as well as Cloud Foundry Environment. To demo it, Java Spring Boot app will be used along with the Kafka service - for the cloud part and docker for local environment setup.

What is it?

Kafka, zookeper, topic, partitions, records, consumer, producer, acknowledgement maybe all these terms ring a bell, but you didn't get a chance to understand their purpose and how they relate with Kafka. Hopefully after reading this blog you'll have a better understanding.

I didn't answer the question from above, let's try and fetch a definition from Wikipedia:

"Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds."

I can feel you rolling your eyes already. At first sight it seems like this phrase was generated using a buzzword website like this one https://www.kopf.com.br/buzzwordmaster/buzzword-generator.html. After translating the sentence from above to plain English we got this result:

Apache Kafka is a software from which the original source code is made freely available and may be redistributed and modified according to the requirement of the user which takes care of processing of data in motion, or in other words, computing on data directly as it is produced or received. It is developed by Apache Software Foundation and it is written in Scala and Java. The project aims to provides a unified real-data system by processing millions of messages per second with limited resources. Due to it's nature it can handle really well big data systems.

Now let's dive in the definition of Kafka's main terminology:

Producers: produce/create/send data

Consumers: read/ingest data

Streams: transforms data

Records: kafka organizes data into records. Records are transported in a key value format, where the data is contained in the value and the key acts as the identifier, plus it contains some time information

Topics: act like labels for records. Consumers and producers communicate with each other via the topic name

Partitions: topics are divided into partitions, they are just files into which recorders are added. Partitions permit records to be processed in parallel and maintain an order per partition. Order is not guaranteed across partitions, they are uniquely stored across partitions by calculating the hash of the key.

Zookeper: acts like a manager, where he keeps track and coordinates everything, from Kafka's cluster nodes, to topics, partitions

Acknowledgment: after the records/messages are processed by the consumer an acknowledgment is sent to Kafka

What Are the Use Cases?

Big data ingestion - suits very well IoT projects, distributed systems and cloud native architectures in general

You have a lot of microservices which need to exchange large amounts of data in an asynchronous way

Processing lots of data in real time

Log aggregation

Analytics

How it Can Be Used?

SAP Cloud Platform - Cloud Foundry Setup

Unfortunately the SAP Cloud Platform Kafka service can be used only for internal product development. It's available only via IT ticket request with a solid reason, so you're not able to assign the Kafka Service to your global account until you have the approval.

From SAP's internal price list wiki: "Kafka is offered in a restricted manner. Prior to usage of Kafka in your product&commercialization efforts: Please get conformation that the required setup can be delivered from Thomas Heinze."

After you get the approval, you will need to create a Kafka service instance, just like for the other services.

Go to the desired global account -> service marketplace -> Apache Kafka -> create a service instance, select an appropriate name and plan and that that's it. Reference the service name in your microservices (via the mta.yaml or manifest.yaml) and now you are ready to use Kafka in your cloud environment.

Docker Kafka Image - Local Setup

Ok, so above saw how we can configure Kafka for our SAP Cloud Foundry account, but what if we want to play around with our microservices locally? Well for this we shall use Docker, to create the messaging system for them to communicate. Below you'll find the details

Local Dependencies Setup Using Docker

First of all, what is Docker? Docker is a tool that allows packaging of libraries and dependencies into an isolated enviornment called containers. For example if your application needs to have a specific version of nodejs in order for it to run locally, you wouldn't have to worry that your developers don't have that specific one installed on their machines. You'll simply pack it in a docker config file, along with other necessary dependencies (such as Kafka, Redis, PostgreSQL) and start it up. No more "it works on my machine" excuse.

Docker composer configuration for setting up redis, zookeper and a kafka broker, that can be accessed from outside the docker container.

Docker Installation

In order to start up the docker containers via the docker compose file you first need to have docker installed.

See installation steps based on your operation system:

Windows: https://docs.docker.com/docker-for-windows/install/

MacOS: https://docs.docker.com/docker-for-mac/install/

Start Services:

1) start docker desktop

2) start up the docker container by executing the following commands:

docker-compose up -d (for interactive terminal)

docker-compose up (for seeing the logs)

Stop Services:

docker-compose down

Note: Don't interrupt the terminal by pressing the ctrl+c, ctrl+z or ctrl+d for stopping the containers. You need to use docker-compose down, otherwise you will run into errors next time you'll need to start the services.

Docker Composer Explained

The Kafka Docker image used is referenced and downloaded from Docker Hub.

 kafka:



image: wurstmeister/kafka:2.11-2.0.0



depends_on:



- zookeeper



ports:



- "9092:9092"



expose:



- "9093"



environment:



KAFKA_ADVERTISED_LISTENERS: INSIDE://kafka:9093,OUTSIDE://localhost:9092



KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT



KAFKA_LISTENERS: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092



KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181



KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE

Let's try and explain each of Kafka environment variables from above:

KAFKA_ADVERTISED_LISTENERS - the list of available addresses that points to the Kafka broker. Kafka will send them to clients on their initial connection

KAFKA_LISTENERS - the list of addresses (0.0.0.0:9093, 0.0.0.0:9092) and listener names (INSIDE, OUTSIDE) on which Kafka broker will listen on for incoming connections.

KAFKA_LISTENER_SECURITY_PROTOCOL_MAP - maps the defined above listener names (INSIDE, OUTSIDE) to the PLAINTEXT Kafka protocol.

KAFKA_INTER_BROKER_LISTENER_NAME - points to a listener name that will be used for cross-broker communication.

Here we defined two listeners (INSIDE://0.0.0.0:9093, OUTSIDE://0.0.0.0:9092). INSIDE listener is used for communication inside the Docker's container, while the other is used for calls external to its network. For connecting a producer/consumer that resides outside of the container, you need to connect it to localhost:9092, otherwise you should use kafka:9093. Each Docker container on the same will use the hostname of the Kafka broker container to reach it, in our case it's called Kafka.

Topics are managed by Kafka, which is a service running in a Docker container. Since the docker image comes with a Kafka server, we can execute the scripts that come with, by prefixing them with docker exec. Find below how you can publish or subscribe messages from the Kafka broker.

View all topics

docker exec -t docker-images_kafka_1 kafka-topics.sh --list --zookeeper zookeeper:2181

Creating a Topic

docker exec -t docker-images_kafka_1 kafka-topics.sh --create --topic <topicName> --partitions <numberOfPartitions> --replication-factor 1 --zookeeper zookeeper:2181

Publish Message to Topic Inside Docker

The producer and Kafka broker are inside the Docker container.

docker exec -it docker-images_kafka_1 kafka-console-producer.sh --broker-list kafka:9092 --topic <topicName>

Publish Message to Topic Outside of Docker

docker exec -it docker-images_kafka_1 kafka-console-producer.sh --broker-list kafka:9093 --topic <topicName>

Consume Message from Topic Inside of Docker

docker exec -it docker-images_kafka_1 kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic <topicName> --from-beginning

Consume Message from Topic Outside of Docker

The consumer is outside, the Kafka broker is inside the Docker network.

docker exec -t docker-images_kafka_1 kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic <topicName> --from-beginning

Now that we have our Kafka container setup, let's see what we need to configure in our spring boot app so that our microservices can connect to it.

I won't go into all the details of how you can configure a spring boot application with a Kafka producer/consumer or pom configuration, as there are plenty of blogs, tutorials on this topic. Here you can find a nice explanation of how Apache Kafka can be set up in a Spring application.

I'll just show you how you can connect from a java spring boot app to a Kafka container running locally in docker.

After the container is started with success (you can check the status of the process running with docker-compose), you need to add in your application-local.yaml file from your java microservice the following configurations. This informs the microservice into which server he needs to connect to.

Let's produce some records from our kafka container on the topic BlogPost-topic-demo. Prior to this I have configured a listener method in my spring boot application. This is how the method looks like:

@KafkaListener(id="test-container",

               topicPattern="BlogPost-topic-demo",

               groupId="blogPostgroup")



public void handleTest(@PayloadList<String>payloads,

                       @Header(KafkaHeaders.RECEIVED_TOPIC)Stringtopic,

                       Acknowledgmentacknowledgment){

for(inti = 0; i < payloads.size(); i++){

       System.out.println("Received records "

          + payloads.toString()

          + " ontopic "+ topic);

     }

}

Takin' the command from above, which helps us produce topics outside of the docker container (producing messages for the java consumer microservice), and writing some records into it we got this results:

Summary

We've talked about what Kafka is and how you can use it to take your data processing to the next level.

As a side note I think it's important to mention that although Kafka has a lot of advantages, it can also bring a lot of complexity and unnecessary overhead into your project. Make sure it matches your project needs and don't forget to take into account future growth plans.

Below are listed some of the use cases when you should consider using something else for asynchronous communication.

you don't have millions of requests that need to be processed in a short amount of time

you don't expect to increase exponentially over the course of next years

If you need to process all the messages in a certain order

If you have only one producer and one consumer

If all you need is a task queue you should consider using RabbitMQ instead

What other topics would you like to hear from me? Are there any other topics you would like me to write about? Please let me know in the comment section below. My previous blog post is related with the destination service from SAP Cloud Foundry: https://blogs.sap.com/2020/10/09/destination-service-how-it-can-be-used/comment-page-1/#comment-5332....

My experience is in Cloud, Distributed Systems, Architecture, NodeJS, SAPUI5, Java, Spring, Docker, SAP Cloud Application Programming Model, SAP HANA, PostgreSQL.