How to manually commit offset in Spark Kafka direct streaming?

后端未结

关注

 2  1754

独厮守ぢ

I looked around hard but didn\'t find a satisfactory answer to this. Maybe I\'m missing something. Please help.

We have a Spark streaming application consuming a Kaf

相关标签:

2条回答

一生所求

2020-12-11 06:05

If you check you logs you would see

2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding enable.auto.commit to false for executor
2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding auto.offset.reset to none for executor
2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding executor group.id to spark-executor-customer_pref_az_kafka_spout_stg_2
2019-10-24 14:14:45 WARN  KafkaUtils:66 - overriding receive.buffer.bytes to 65536 see KAFKA-3135

These properties are overridden by spark code.

In order to manually commit you could follow spark docs

https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself

0 讨论(0)

无人及你

2020-12-11 06:15

The article below could be a good start to understand the approach.

spark-kafka-achieving-zero-data-loss

Further more,

The article suggests using zookeeper client directly, which can be replaced by something like KafkaSimpleConsumer also. The advantage of using Zookeper/KafkaSimpleConsumer is the monitoring tools that depend on Zookeper saved offset. Also the information can also be saved on HDFS or any other reliable service.

0 讨论(0)
发布评论:

提交评论
- 加载中...