In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of dat
Firstly
Apache ZooKeeper is a distributed store which is used to provide configuration and synchronization services in a high available way.
In more recent versions of Kafka, work was done in order for the client consumers to not store information about how far it had consumed messages (called offsets) into ZooKeeper.This reduced usage did not get rid of the need for consensus and coordination in distributed systems however.
While Kafka provides fault-tolerance and resilience, something is needed in order to provide the coordination needed and ZooKeeper enables that piece of the overall system.
Secondly
Agreeing on who the leader of a partition is, is one example of the practical application of ZooKeeper within the Kafka ecosystem.
Zookeeper would work if there was even a single broker.
These are from Kafka In Action book. Image is from this course
The request to run Kafka without Zookeeper seems to be quite common. The library Charlatan addresses this.
According to the description is Charlatan more or less a mock for Zookeeper, providing the Zookeeper services either backed up by other tools or by a database.
I encountered that library when dealing with the main product of the authors for the Charlatan library; there it works fine …
Important update - August 2019:
ZooKeeper dependency will be removed from Apache Kafka. See the high-level discussion in KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum.
These efforts will take a few Kafka releases and additional KIPs. Kafka Controllers will take over the tasks of current ZooKeeper tasks. The Controllers will leverage the benefits of the Event Log which is a core concept of Kafka.
Some benefits of the new Kafka architecture are a simpler architecture, ease of operations, and better scalability e.g. allow "unlimited partitions".
Updated on Nov 2020
For the latest version (2.6.0) ZooKeeper is still required for running Kafka, but in the near future ZooKeeper will be replaced with a Self-Managed Metadata Quorum.
See details in the accepted KIP-500.
1. Current status
Kafka uses ZooKeeper to store its metadata about partitions and brokers, and to elect a broker to be the Kafka Controller.
Currently, removing this dependency on ZooKeeper is work in progress (through the KIP-500) .
2. Profit of removal
Removing the Apache ZooKeeper dependency provides three distinct benefits:
3. Roadmap
ZooKeeper removal is expected in 2021 and has some milestones which are represented in the following KIPs:
| KIP | Name | Status | Fix Version/s |
|:-------:|:--------------------------------------------------------:|:----------------:|---------------|
| KIP-455 | Create an Administrative API for Replica Reassignment | Accepted | 2.6.0 |
| KIP-497 | Add inter-broker API to alter ISR | Accepted | 2.7.0 |
| KIP-543 | Expand ConfigCommand's non-ZK functionality | Accepted | 2.6.0 |
| KIP-555 | Deprecate Direct ZK access in Kafka Administrative Tools | Accepted | None |
| KIP-589 | Add API to update Replica state in Controller | Accepted | None |
| KIP-590 | Redirect Zookeeper Mutation Protocols to The Controller | Accepted | None |
| KIP-595 | A Raft Protocol for the Metadata Quorum | Accepted | None |
| KIP-631 | The Quorum-based Kafka Controller | Under discussion | None |
KIP-500 introduced the concept of a bridge release that can coexist with both pre- and post-KIP-500 versions of Kafka. Bridge releases are important because they enable zero-downtime upgrades to the post-ZooKeeper world.
References:
This article explains the role of Zookeeper in Kafka. It explains how kafka is stateless and how zookeper plays an important role in distributed nature of kafka (and many more distributed systems).
Yes, Zookeeper is required for running Kafka. From the Kafka Getting Started documentation:
Step 2: Start the server
Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance.
As to why, well people long ago discovered that you need to have some way to coordinating tasks, state management, configuration, etc across a distributed system. Some projects have built their own mechanisms (think of the configuration server in a MongoDB sharded cluster, or a Master node in an Elasticsearch cluster). Others have chosen to take advantage of Zookeeper as a general purpose distributed process coordination system. So Kafka, Storm, HBase, SolrCloud to just name a few all use Zookeeper to help manage and coordinate.
Kafka is a distributed system and is built to use Zookeeper. The fact that you are not using any of the distributed features of Kafka does not change how it was built. In any event there should not be much overhead from using Zookeeper. A bigger question is why you would use this particular design pattern -- a single broker implementation of Kafka misses out on all of the reliability features of a multi-broker cluster along with it's ability to scale.