How to manually commit offset in Spark Kafka direct streaming?

后端 未结 2 1754
独厮守ぢ
独厮守ぢ 2020-12-11 05:50

I looked around hard but didn\'t find a satisfactory answer to this. Maybe I\'m missing something. Please help.

We have a Spark streaming application consuming a Kaf

相关标签:
2条回答
  • 2020-12-11 06:05

    If you check you logs you would see

    2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding enable.auto.commit to false for executor
    2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding auto.offset.reset to none for executor
    2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding executor group.id to spark-executor-customer_pref_az_kafka_spout_stg_2
    2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding receive.buffer.bytes to 65536 see KAFKA-3135
    

    These properties are overridden by spark code.

    In order to manually commit you could follow spark docs

    https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself

    0 讨论(0)
  • 2020-12-11 06:15

    The article below could be a good start to understand the approach.

    spark-kafka-achieving-zero-data-loss

    Further more,

    The article suggests using zookeeper client directly, which can be replaced by something like KafkaSimpleConsumer also. The advantage of using Zookeper/KafkaSimpleConsumer is the monitoring tools that depend on Zookeper saved offset. Also the information can also be saved on HDFS or any other reliable service.

    0 讨论(0)
提交回复
热议问题