I looked around hard but didn\'t find a satisfactory answer to this. Maybe I\'m missing something. Please help.
We have a Spark streaming application consuming a Kaf
If you check you logs you would see
2019-10-24 14:14:45 WARN KafkaUtils:66 - overriding enable.auto.commit to false for executor
2019-10-24 14:14:45 WARN KafkaUtils:66 - overriding auto.offset.reset to none for executor
2019-10-24 14:14:45 WARN KafkaUtils:66 - overriding executor group.id to spark-executor-customer_pref_az_kafka_spout_stg_2
2019-10-24 14:14:45 WARN KafkaUtils:66 - overriding receive.buffer.bytes to 65536 see KAFKA-3135
These properties are overridden by spark code.
In order to manually commit you could follow spark docs
https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself
The article below could be a good start to understand the approach.
spark-kafka-achieving-zero-data-loss
Further more,
The article suggests using zookeeper client directly, which can be replaced by something like KafkaSimpleConsumer also. The advantage of using Zookeper/KafkaSimpleConsumer is the monitoring tools that depend on Zookeper saved offset. Also the information can also be saved on HDFS or any other reliable service.