Extract the time stamp from kafka messages in spark streaming?

天涯浪子 提交于 2019-12-07 19:01:25

问题


Trying to read from kafka source. I want to extract timestamp from message received to do structured spark streaming. kafka(version 0.10.0.0) spark streaming(version 2.0.1)


回答1:


I'd suggest couple things:

  1. Suppose you create a stream via latest Kafka Streaming Api (0.10 Kafka)

    E.g. you use dependency: "org.apache.spark" %% "spark-streaming-kafka-0-10" % 2.0.1

    Than you create a stream, according to the docs above:

     val kafkaParams = Map[String, Object](
         "bootstrap.servers" -> "broker1:9092,broker2:9092",
         "key.deserializer" -> classOf[StringDeserializer],
         "value.deserializer" -> classOf[ByteArrayDeserializer],
         "group.id" -> "spark-streaming-test",
         "auto.offset.reset" -> "latest",
         "enable.auto.commit" -> (false: java.lang.Boolean))
    
    val sparkConf = new SparkConf()
    // suppose you have 60 second window
    val ssc = new StreamingContext(sparkConf, Seconds(60))
    ssc.checkpoint("checkpoint")
    
    val stream = KafkaUtils.createDirectStream(ssc, PreferConsistent,
    Subscribe[String, Array[Byte]](topics, kafkaParams))
    
  2. Your stream will be an DStream of ConsumerRecord[String,Array[Byte]] and you can get a timestamp and key-value as simple as:

    stream.map { record => (record.timestamp(), record.key(), record.value())  }
    

Hope that helps.




回答2:


spark.read
  .format("kafka")
  .option("kafka.bootstrap.servers", "your.server.com:9092")
  .option("subscribe", "your-topic")
  .load()
  .select($"timestamp", $"value")

Field "timestamp" is what you are looking for. Type - java.sql.Timestamp. Make sure that you are connecting to 0.10 Kafka server. There is no timestamp in earlier versions. Full list of fields described here - http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries



来源:https://stackoverflow.com/questions/40586663/extract-the-time-stamp-from-kafka-messages-in-spark-streaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!