Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member662960
Participant

Introduction


 

In this blog post I will talk about Kafka and how you can configure it in your local, as well as Cloud Foundry Environment. To demo it, Java Spring Boot app will be used along with the Kafka service - for the cloud part and docker for local environment setup.

 

What is it?


 

Kafka, zookeper, topic, partitions, records, consumer, producer, acknowledgement maybe all these terms ring a bell, but you didn't get a chance to understand their purpose and how they relate with Kafka. Hopefully after reading this blog you'll have a better understanding.

I didn't answer the question from above, let's try and fetch a definition from Wikipedia:

"Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds."

I can feel you rolling your eyes already. At first sight it seems like this phrase was generated using a buzzword website like this one https://www.kopf.com.br/buzzwordmaster/buzzword-generator.html. After translating the sentence from above to plain English we got this result:

Apache Kafka is a software from which the original source code is made freely available and may be redistributed and modified according to the requirement of the user which takes care of processing of data in motion, or in other words, computing on data directly as it is produced or received.  It is developed by Apache Software Foundation and it is written in Scala and Java. The project aims to provides a unified real-data system by processing millions of messages per second with limited resources. Due to it's nature it can handle really well big data systems.

Now let's dive in the definition of Kafka's main terminology:

  • Producers: produce/create/send data

  • Consumers: read/ingest data

  • Streams: transforms data

  • Records: kafka organizes data into records. Records are transported in a key value format, where the data is contained in the value and the key acts as the identifier, plus it contains some time information

  • Topics: act like labels for records. Consumers and producers communicate with each other via the topic name

  • Partitions: topics are divided into partitions, they are just files into which recorders are added. Partitions permit records to be processed in parallel and maintain an order per partition. Order is not guaranteed across partitions, they are uniquely stored across partitions by calculating the hash of the key.

  • Zookeper: acts like a manager, where he keeps track and coordinates everything, from Kafka's cluster nodes, to topics, partitions

  • Acknowledgment: after the records/messages are processed by the consumer an acknowledgment is sent to Kafka


 

What Are the Use Cases?


 

  • Big data ingestion - suits very well IoT projects, distributed systems and cloud native architectures in general

  • You have a lot of microservices which need to exchange large amounts of data in an asynchronous way

  • Processing lots of data in real time

  • Log aggregation

  • Analytics


 

How it Can Be Used?


 

SAP Cloud Platform - Cloud Foundry Setup


 

Unfortunately  the SAP Cloud Platform Kafka service can be used only for internal product development. It's available only via IT ticket request with a solid reason, so you're not able to assign the Kafka Service to your global account until you have the approval.

From SAP's internal price list wiki: "Kafka is offered in a restricted manner. Prior to usage of Kafka in your product&commercialization efforts: Please get conformation that the required setup can be delivered from Thomas Heinze."

After you get the approval, you will need to create a Kafka service instance, just like for the other services.

Go to the desired global account -> service marketplace -> Apache Kafka -> create a service instance, select an appropriate name and plan and that that's it. Reference the service name in your microservices (via the mta.yaml or manifest.yaml) and now you are ready to use Kafka in your cloud environment.

 

Docker Kafka Image - Local Setup


 

Ok, so above saw how we can configure Kafka for our SAP Cloud Foundry account, but what if we want to play around with our microservices locally? Well for this we shall use Docker, to create the messaging system for them to communicate. Below you'll find the details

 

Local Dependencies Setup Using Docker

First of all, what is Docker? Docker is a tool that allows packaging of libraries and dependencies into an isolated enviornment called containers. For example if your application needs to have a specific version of nodejs in order for it to run locally, you wouldn't have to worry that your developers don't have that specific one installed  on their machines. You'll simply pack it in a docker config file, along with other necessary dependencies (such as Kafka, Redis, PostgreSQL) and start it up. No more "it works on my machine" excuse.

Docker composer configuration for setting up redis, zookeper and a kafka broker, that can be accessed from outside the docker container.

 

 Docker Installation

In order to start up the docker containers via the docker compose file you first need to have docker installed.

See installation steps based on your operation system:

Start Services:

1) start docker desktop

2) start up the docker container by executing the following commands:
docker-compose up -d (for interactive terminal) 

docker-compose up (for seeing the logs)

 

Stop Services:
docker-compose down

Note: Don't interrupt the terminal by pressing the ctrl+c, ctrl+z or ctrl+d for stopping the containers. You need to use docker-compose down, otherwise you will run into errors next time you'll need to start the services.

Docker Composer Explained

The Kafka Docker image used is referenced and downloaded from Docker Hub.
 kafka:

image: wurstmeister/kafka:2.11-2.0.0

depends_on:

- zookeeper

ports:

- "9092:9092"

expose:

- "9093"

environment:

KAFKA_ADVERTISED_LISTENERS: INSIDE://kafka:9093,OUTSIDE://localhost:9092

KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT

KAFKA_LISTENERS: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092

KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181

KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE

Let's try and explain each of Kafka environment variables from above:

  • KAFKA_ADVERTISED_LISTENERS - the list of available addresses that points to the Kafka broker. Kafka will send them to clients on their initial connection

  • KAFKA_LISTENERS - the list of addresses (0.0.0.0:9093, 0.0.0.0:9092) and listener names (INSIDE, OUTSIDE) on which Kafka broker will listen on for incoming connections.

  • KAFKA_LISTENER_SECURITY_PROTOCOL_MAP - maps the defined above listener names (INSIDE, OUTSIDE) to the PLAINTEXT Kafka protocol.

  • KAFKA_INTER_BROKER_LISTENER_NAME - points to a listener name that will be used for cross-broker communication.


Here we defined two listeners (INSIDE://0.0.0.0:9093, OUTSIDE://0.0.0.0:9092). INSIDE listener is used for communication inside the Docker's container, while the other is used for calls external to its network. For connecting a producer/consumer that resides outside of the container, you need to connect it to localhost:9092, otherwise you should use kafka:9093. Each Docker container on the same will use the hostname of the Kafka broker container to reach it, in our case it's called Kafka.

Topics are managed by Kafka, which is a service running in a Docker container. Since the docker image comes with a Kafka server, we can execute the scripts that come with, by prefixing them with docker exec. Find below how you can publish or subscribe messages from the Kafka broker.

 

View all topics

docker exec -t docker-images_kafka_1 kafka-topics.sh --list --zookeeper zookeeper:2181

Creating a Topic

docker exec -t docker-images_kafka_1 kafka-topics.sh --create --topic <topicName> --partitions <numberOfPartitions> --replication-factor 1 --zookeeper zookeeper:2181

Publish Message to Topic Inside Docker

The producer and Kafka broker are inside the Docker container.

docker exec -it docker-images_kafka_1 kafka-console-producer.sh --broker-list kafka:9092 --topic <topicName>

Publish Message to Topic Outside of Docker

docker exec -it docker-images_kafka_1 kafka-console-producer.sh --broker-list kafka:9093 --topic <topicName>

Consume Message from Topic Inside of Docker

docker exec -it docker-images_kafka_1 kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic <topicName> --from-beginning

Consume Message from Topic Outside of Docker

The consumer is outside, the Kafka broker is inside the Docker network.

docker exec -t docker-images_kafka_1 kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic <topicName> --from-beginning

Now that we have our Kafka container setup, let's see what we need to configure in our spring boot app so that our microservices can connect to it.

I won't go into all the details of how you can configure a spring boot application with a Kafka producer/consumer or pom configuration, as there are plenty of blogs, tutorials on this topic. Here you can find a nice explanation of how Apache Kafka can be set up in a Spring application.

I'll just show you how you can connect from a java spring boot app to a Kafka container running locally in docker.

 

After the container is started with success (you can check the status of the process running with docker-compose), you need to add in your application-local.yaml file from your java microservice the following configurations. This informs the microservice into which server he needs to connect to.


 

Let's produce some records from our kafka container on the topic BlogPost-topic-demo. Prior to this I have configured a listener method in my spring boot application. This is how the method looks like:

 
@KafkaListener(id="test-container",
topicPattern="BlogPost-topic-demo",
groupId="blogPostgroup")

public void handleTest(@PayloadList<String>payloads,
@Header(KafkaHeaders.RECEIVED_TOPIC)Stringtopic,
Acknowledgmentacknowledgment){
for(inti = 0; i < payloads.size(); i++){
System.out.println("Received records "
+ payloads.toString()
+ " ontopic "+ topic);
}
}

 

Takin' the command from above, which helps us produce topics outside of the docker container (producing messages for the java consumer microservice), and writing some records into it we got this results:

 



Summary


 

We've talked about what Kafka is and how you  can use it to take your data processing to the next level.

As a side note I think it's important to mention that although Kafka has a lot of advantages, it can also bring a lot of complexity and unnecessary overhead into your project. Make sure it matches your project needs and don't forget to take into account future growth plans.

Below are listed some of the use cases when you should consider using something else for asynchronous communication.

  • you don't have millions of requests that need to be processed in a short amount of time

  • you don't expect to increase exponentially over the course of next years

  • If you need to process all the messages in a certain order

  • If you have only one producer and one consumer

  • If all you need is a task queue you should consider using RabbitMQ instead


 

What other topics would you like to hear from me? Are there any other topics you would like me to write about? Please let me know in the comment section below. My previous blog post is related with the destination service from SAP Cloud Foundry: https://blogs.sap.com/2020/10/09/destination-service-how-it-can-be-used/comment-page-1/#comment-5332....

My experience is in Cloud, Distributed Systems, Architecture, NodeJS, SAPUI5, Java, Spring, Docker, SAP Cloud Application Programming Model, SAP HANA, PostgreSQL.