问题
How will kafka deal with a call to
KafkaConsumer.commitAsync(Map<TopicPartition, OffsetAndMetadata> offsets, OffsetCommitCallback callback)
when offset value for a topic is given as a lesser value than a previous invocation?
回答1:
It will simply set the offset of the partition to the value you specified,so next time you will consume you message from commitedOffset+1.
The javadoc of commitAsync()
says:
The committed offset should be the next message your application will consume,i.e. lastProcessedMessageOffset + 1.
回答2:
I was curious and tested it to see the behavior. As written in the docs, it is correct what @haoyuwang wrote in his answer (+1).
The reason behind it is quite simple. The committed offsets of a consumer group are stored in Kafka within the internal topic __consumer_offsets
. This topic is compact
which means it is meant to provide the latest value for a given key. In your case the key is a combination of the Consumer Group, Topic and partition whereas your value is the offset.
If you now
- commit offset 10 and due to asynchronous process later
- commit offset 5
offset 5 will be the latest value in the __consumer_offsets
topic. That means the next offset your consumer will read from that topic partition is offset 6 and not offset 11.
How to reproduce
You could simply reproduce it and test it by (synchronously) commit an earlier offset after your regular commit, like this:
consumer.commitSync();
consumer.commitSync(commitFirstMessage);
where commitFirstMessage
is defined as
TopicPartition zeroTopicPartition = new TopicPartition(topic, 0);
OffsetAndMetadata zeroOffset = new OffsetAndMetadata(0L);
Map<TopicPartition, OffsetAndMetadata> commitFirstMessage = new HashMap<>();
commitFirstMessage.put(zeroTopicPartition, zeroOffset);
EDIT:
How to avoid committing lower offsets with commitAsync
In the book Kafka - The Definitive Guide there is a recommendation to avoid commit lower offsets because of a retrying call of commitAsync
:
Retrying Async Commits: A simple pattern to get commit order right for asynchronous retries is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commitAsync callback. When you’re getting ready to send a retry, check if the commit sequence number the callback got is equal to the instance variable; if it is, there was no newer commit and it is safe to retry. If the instance sequence number is higher, don’t retry because a newer commit was already sent.
An implementation could look like this (not actually tested!):
import java.util._
import java.time.Duration
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer, OffsetAndMetadata, OffsetCommitCallback}
import org.apache.kafka.common.{KafkaException, TopicPartition}
import collection.JavaConverters._
object AsyncCommitWithCallback extends App {
// define topic
val topic = "myOutputTopic"
// set properties
val props = new Properties()
props.put(ConsumerConfig.GROUP_ID_CONFIG, "AsyncCommitter5")
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
// [set more properties...]
// create KafkaConsumer and subscribe
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJavaCollection)
// initialize global counter
val atomicLong = new AtomicLong(0)
// consume message
try {
while(true) {
val records = consumer.poll(Duration.ofMillis(1)).asScala
if(records.nonEmpty) {
for (data <- records) {
// do something with the records
}
consumer.commitAsync(new KeepOrderAsyncCommit)
}
}
} catch {
case ex: KafkaException => ex.printStackTrace()
} finally {
consumer.commitSync()
consumer.close()
}
class KeepOrderAsyncCommit extends OffsetCommitCallback {
// keeping position of this callback instance
val position = atomicLong.incrementAndGet()
override def onComplete(offsets: util.Map[TopicPartition, OffsetAndMetadata], exception: Exception): Unit = {
// retrying only if no other commit incremented the global counter
if(exception != null){
if(position == atomicLong.get) {
consumer.commitAsync(this)
}
}
}
}
}
来源:https://stackoverflow.com/questions/63740084/kafkaconsumer-commitasync-behavior-with-a-lower-offset-than-previous