This blog post will give a brief understanding of basic concepts in Apache Kafka.
In simple terms, Apache Kafka is designed for distributed high throughput systems. It tends to work very well as a replacement for a more traditional message broker. In comparison to other messaging systems, Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications.
Next, I’ll introduce some basic concepts about it.
A Kafka server.
Information that is sent from the producer to a consumer through Kafka.
Application that sends the messages.
Application that receives the messages.
Consists of one or more servers (brokers).
A topic is a category name to which message are stored and published. All Kafka messages are organized into topics. Producer applications write data to topics, consumer applications read from topics.
Messages published to the cluster will stay in the cluster until retention period has passed.
Topic are divided into several partitions, which allow to split data across multiple brokers.
Each partition is an ordered, immutable sequence of messages that is continually appended to. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message with in the partition, that looks like this:
Producers publish data to the topics of their choice, there has three ways of deciding which partition the published message belongs to:
1. Specify Partition id
2. Semantic partition function (e.g: key % partition numbers)
3. Round-robin to balance load
Consumers can join a group called a consumer group. A consumer group includes the set of consumers processed that are subscribing to a specific topic. Each consumer in the group is assigned a set of partitions to consumer from. They will receive messages from a different subset of partitions in the topic. Kafka guarantees that a message is only read by a single consumer in the group.
There are 3 possible scenarios for the relationship between number of partitions and number of consumers:
1. Number of consumers is same as number of topic partitions, then the mapping as below:
2. Number of consumers is less than number of topic partitions, then multiple partitions can be assigned to one of consumer in the group. That looks like that:
3. Number of consumers is greater than number of topic partitions, then partition and consumer mapping can be as below, as you see, ‘Consumer 3’ is idle.
Ok, the above is brief introduction for basic concepts of the Apache Kafka, hope it can help you gain some understanding about Apache Kafka.
Thanks for your reading.