问题
I am using a spark streaming application(spark 2.1) to consume data from kafka(0.10.1) topics.I want to subscribe to new topic without restarting the streaming context. Is there any way to achieve this?
I can see a jira ticket in apache spark project for the same (https://issues.apache.org/jira/browse/SPARK-10320),Even though it is closed in 2.0 version, I couldn't find any documentation or example to do this. If any of you are familiar with this, please provide me documentation link or example for the same, . Thanks in advance.
回答1:
Integration between Spark 2.0.x and Kafka 0.10.x support a subscription pattern. From the documentation:
SubscribePattern allows you to use a regex to specify topics of interest. Note that unlike the 0.8 integration, using Subscribe or SubscribePattern should respond to adding partitions during a running stream.
You can use a regex pattern to register to all the topics you wish.
class SubscribePattern[K, V](
pattern: java.util.regex.Pattern,
kafkaParams: java.util.Map[String, Object],
offsets: java.util.Map[TopicPartition, java.util.Long]
) extends ConsumerStrategy[K, V]
回答2:
You can subscribe multiple topics like topic1,topic2 etc
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1,topic2")
.load()
For more information, kafka Guide
回答3:
I found this solution more suitable for my purpose.We can share a 'StreamingContext' instance with different dstreams. For better management we can create separate 'dStream' instance for each topic using same streaming context, this 'dStream' instance you can store in a map with its topic name, so that later you can stop or unsubscribe from that particular topic. Please see the code below for clarity.
<script src="https://gist.github.com/shemeemsp7/01d21588347b94204c71a14005be8afa.js"></script>
来源:https://stackoverflow.com/questions/46336620/spark-streaming-kafka-integration-support-new-topic-subscriptions-without-re