kafka and Spark: Get first offset of a topic via API

喜夏-厌秋 提交于 2019-12-14 02:30:56

问题


I am playing with Spark Streaming and Kafka (with the Scala API), and would like to read message from a set of Kafka topics with Spark Streaming.

The following method:

val kafkaParams = Map("metadata.broker.list" -> configuration.getKafkaBrokersList(), "auto.offset.reset" -> "smallest")
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)

reads from Kafka to the latest available offset, but doesn't give me the metadata that I need (since I am reading from a set of topics, I need for every message I read that topic) but this other method KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, Tuple2[String, String]](ssc, kafkaParams, currentOffsets, messageHandler) wants explicitly an offset that I don't have.

I know that there is this shell command that gives you the last offset.

kafka-run-class.sh kafka.tools.GetOffsetShell 
  --broker-list <broker>:  <port> 
  --topic <topic-name> --time -1 --offsets 1 

and KafkaCluster.scala is an API that is for developers that used to be public and gives you exactly what I would like.

Hint?


回答1:


You can use the code from GetOffsetShell.scala kafka API documentation

val consumer = new SimpleConsumer(leader.host, leader.port, 10000, 100000, clientId)
val topicAndPartition = TopicAndPartition(topic, partitionId)
val request = OffsetRequest(Map(topicAndPartition -> PartitionOffsetRequestInfo(time, nOffsets)))
val offsets = consumer.getOffsetsBefore(request).partitionErrorAndOffsets(topicAndPartition).offsets

Or you can create new consumer with unique groupId and use it for getting first offset

val consumer=new KafkaConsumer[String, String](createConsumerConfig(config.brokerList))
consumer.partitionsFor(config.topic).foreach(pi => {
      val topicPartition = new TopicPartition(pi.topic(), pi.partition())

      consumer.assign(List(topicPartition))
      consumer.seekToBeginning()
      val firstOffset = consumer.position(topicPartition)
 ...


来源:https://stackoverflow.com/questions/43281893/kafka-and-spark-get-first-offset-of-a-topic-via-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!