Spark Streaming + Kafka Integration : Support new topic subscriptions without requiring restart of the streaming context

核能气质少年 提交于 2021-01-28 05:16:07

问题


I am using a spark streaming application(spark 2.1) to consume data from kafka(0.10.1) topics.I want to subscribe to new topic without restarting the streaming context. Is there any way to achieve this?

I can see a jira ticket in apache spark project for the same (https://issues.apache.org/jira/browse/SPARK-10320),Even though it is closed in 2.0 version, I couldn't find any documentation or example to do this. If any of you are familiar with this, please provide me documentation link or example for the same, . Thanks in advance.


回答1:


Integration between Spark 2.0.x and Kafka 0.10.x support a subscription pattern. From the documentation:

SubscribePattern allows you to use a regex to specify topics of interest. Note that unlike the 0.8 integration, using Subscribe or SubscribePattern should respond to adding partitions during a running stream.

You can use a regex pattern to register to all the topics you wish.

class SubscribePattern[K, V](
    pattern: java.util.regex.Pattern,
    kafkaParams: java.util.Map[String, Object],
    offsets: java.util.Map[TopicPartition, java.util.Long]
) extends ConsumerStrategy[K, V]



回答2:


You can subscribe multiple topics like topic1,topic2 etc

val df = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1,topic2")
  .load()

For more information, kafka Guide




回答3:


I found this solution more suitable for my purpose.We can share a 'StreamingContext' instance with different dstreams. For better management we can create separate 'dStream' instance for each topic using same streaming context, this 'dStream' instance you can store in a map with its topic name, so that later you can stop or unsubscribe from that particular topic. Please see the code below for clarity.

<script src="https://gist.github.com/shemeemsp7/01d21588347b94204c71a14005be8afa.js"></script>


来源:https://stackoverflow.com/questions/46336620/spark-streaming-kafka-integration-support-new-topic-subscriptions-without-re

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!