Is scalability applicable with Kafka stream if each topic has single partition

只愿长相守 提交于 2019-12-25 01:27:56

问题


My understanding as per Kafka stream documentation, Maximum possible parallel tasks is equal to maximum number of partitions of a topic among all topics in a cluster.

I have around 60 topics at Kafka cluster. Each topic has single partition only. Is it possible to achieve scalability/parallelism with Kafka stream for my Kafka cluster?


回答1:


Do you want to do the same computation over all topics? For this, I would recommend to introduce an extra topic with many partitions that you use to scale out:

// using new 1.0 API
StreamsBuilder builder = new StreamsBuilder():
KStream parallelizedStream = builder
    .stream(/* subscribe to all topics at once*/)
    .through("topic-with-many-partitions");

// apply computation
parallelizedStream...

Note: You need to create the topic "topic-with-many-partitions" manually before starting your Streams application

Pro Tip:

The topic "topic-with-many-partitions" can have a very short retention time as it's only used for scaling and must not hold data long term.

Update

If you have 10 topic T1 to T10 with a single partitions each, the program from above will execute as follows (with TN being the dummy topic with 10 partitions):

T1-0  --+           +--> TN-0   --> T1_1
...   --+--> T0_0 --+--> ...    --> ...
T10-0 --+           +--> TN-10  --> T1_10

The first part of your program will only read all 10 input topics and write it back into 10 partitions of TN. Afterwards, you can get up to 10 parallel tasks, each processing one input partition. If you start 10 KafakStreams instances, only one will execute T0_0, and each will alsa one T1_x running.



来源:https://stackoverflow.com/questions/47325678/is-scalability-applicable-with-kafka-stream-if-each-topic-has-single-partition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!