Is Zookeeper a must for Kafka?

后端未结

关注

 12  1605

In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of dat

相关标签:

12条回答

野性不改

2020-12-04 07:09
Firstly

Apache ZooKeeper is a distributed store which is used to provide configuration and synchronization services in a high available way. In more recent versions of Kafka, work was done in order for the client consumers to not store information about how far it had consumed messages (called offsets) into ZooKeeper.This reduced usage did not get rid of the need for consensus and coordination in distributed systems however. While Kafka provides fault-tolerance and resilience, something is needed in order to provide the coordination needed and ZooKeeper enables that piece of the overall system.

Secondly

Agreeing on who the leader of a partition is, is one example of the practical application of ZooKeeper within the Kafka ecosystem.
```
Zookeeper would work if there was even a single broker. 
```
These are from Kafka In Action book. Image is from this course
0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-12-04 07:11

The request to run Kafka without Zookeeper seems to be quite common. The library Charlatan addresses this.

According to the description is Charlatan more or less a mock for Zookeeper, providing the Zookeeper services either backed up by other tools or by a database.

I encountered that library when dealing with the main product of the authors for the Charlatan library; there it works fine …

0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-04 07:18

Important update - August 2019:

ZooKeeper dependency will be removed from Apache Kafka. See the high-level discussion in KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum.

These efforts will take a few Kafka releases and additional KIPs. Kafka Controllers will take over the tasks of current ZooKeeper tasks. The Controllers will leverage the benefits of the Event Log which is a core concept of Kafka.

Some benefits of the new Kafka architecture are a simpler architecture, ease of operations, and better scalability e.g. allow "unlimited partitions".

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2020-12-04 07:20
Updated on Nov 2020

For the latest version (2.6.0) ZooKeeper is still required for running Kafka, but in the near future ZooKeeper will be replaced with a Self-Managed Metadata Quorum.

See details in the accepted KIP-500.

1. Current status

Kafka uses ZooKeeper to store its metadata about partitions and brokers, and to elect a broker to be the Kafka Controller.

Currently, removing this dependency on ZooKeeper is work in progress (through the KIP-500) .

2. Profit of removal

Removing the Apache ZooKeeper dependency provides three distinct benefits:
- First, it simplifies the architecture by consolidating metadata in Kafka itself, rather than splitting it between Kafka and ZooKeeper. This improves stability, simplifies the software, and makes it easier to monitor, administer, and support Kafka.
- Second, it improves control plane performance, enabling clusters to scale to millions of partitions.
- Finally, it allows Kafka to have a single security model for the whole system, rather than having one for Kafka and one for Zookeeper.
3. Roadmap

ZooKeeper removal is expected in 2021 and has some milestones which are represented in the following KIPs:
```
|   KIP   |                           Name                           |      Status      | Fix Version/s |
|:-------:|:--------------------------------------------------------:|:----------------:|---------------|
| KIP-455 | Create an Administrative API for Replica Reassignment    |     Accepted     | 2.6.0         |
| KIP-497 | Add inter-broker API to alter ISR                        |     Accepted     | 2.7.0         |
| KIP-543 | Expand ConfigCommand's non-ZK functionality              |     Accepted     | 2.6.0         |
| KIP-555 | Deprecate Direct ZK access in Kafka Administrative Tools |     Accepted     | None          |
| KIP-589 | Add API to update Replica state in Controller            |     Accepted     | None          |
| KIP-590 | Redirect Zookeeper Mutation Protocols to The Controller  |     Accepted     | None          |
| KIP-595 | A Raft Protocol for the Metadata Quorum                  |     Accepted     | None          |
| KIP-631 | The Quorum-based Kafka Controller                        | Under discussion | None          |
```
KIP-500 introduced the concept of a bridge release that can coexist with both pre- and post-KIP-500 versions of Kafka. Bridge releases are important because they enable zero-downtime upgrades to the post-ZooKeeper world.

References:
1. KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
2. Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency
3. Preparing Your Clients and Tools for KIP-500: ZooKeeper Removal from Apache Kafka
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-12-04 07:25

This article explains the role of Zookeeper in Kafka. It explains how kafka is stateless and how zookeper plays an important role in distributed nature of kafka (and many more distributed systems).

0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-04 07:28

Yes, Zookeeper is required for running Kafka. From the Kafka Getting Started documentation:

Step 2: Start the server

Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance.

As to why, well people long ago discovered that you need to have some way to coordinating tasks, state management, configuration, etc across a distributed system. Some projects have built their own mechanisms (think of the configuration server in a MongoDB sharded cluster, or a Master node in an Elasticsearch cluster). Others have chosen to take advantage of Zookeeper as a general purpose distributed process coordination system. So Kafka, Storm, HBase, SolrCloud to just name a few all use Zookeeper to help manage and coordinate.

Kafka is a distributed system and is built to use Zookeeper. The fact that you are not using any of the distributed features of Kafka does not change how it was built. In any event there should not be much overhead from using Zookeeper. A bigger question is why you would use this particular design pattern -- a single broker implementation of Kafka misses out on all of the reliability features of a multi-broker cluster along with it's ability to scale.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2