MessageHandler in KafkaUtils010 SparkStreaming

扶醉桌前 提交于 2019-12-12 03:34:17

问题


I wanted to group per topic or know from which topic a message comes when applying:

val stream = KafkaUtils.createDirectStream[String, String](
    ssc,
    PreferConsistent, 
    Subscribe[String, String](
      Array(topicConfig.srcTopic),
      kafkaParameters(BOOTSTRAP_SERVERS,"kafka_test_group_id))
    )
  )

However in the latest API kafka010 does not seem to support a message handler as in previous versions. Any idea on how to get the topic?

My goal is to consume from N topics process them (in different ways depending on the topic) and then push it back to another N topics in a 1:1 mapping of the topics:

SrcTopicA--> Process --> DstTopicA
SrcTopicB--> Process --> DstTopicB
SrcTopicC--> Process --> DstTopicC

But there are some attributes that need to be shared (that change a lot so there is no possibility of using a broadcast variable). So all the topics need to be consumed in the same spark job.


回答1:


When you use createDirectStream in 0.10 you get back a ConsumerRecord. This record has a topic value. You can create a tuple of the topic and value:

val stream: InputDStream[ConsumerRecord[String, String]] = 
  KafkaUtils.createDirectStream[String, String](
    streamingContext,
    PreferConsistent,
    Subscribe[String, String](topics, kafkaParams)
  )

val res: DStream[(String, String)] = stream.map(record => (record.topic(), record.value()))



回答2:


You can filter the stream using topic like this:

stream.filter(cr => cr.topic)


来源:https://stackoverflow.com/questions/42813846/messagehandler-in-kafkautils010-sparkstreaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!