Spring Batch - Kafka: KafkaItemReader reads the data ALWAYS from beginning

问题

I'm willing to use Spring Batch for Kafka data consumption. This spring-tips link has a basic example for the same.

Here's my reader:

  @Bean
  KafkaItemReader<String, String> kafkaItemReader() {
    var props = new Properties();
    props.putAll(this.properties.buildConsumerProperties());

    return new KafkaItemReaderBuilder<String, String>()
        .partitions(0)
        .consumerProperties(props)
        .name("customers-reader")
        .saveState(true)
        .topic("test-consumer")
        .build();
  }

My application.properties file:

 spring:
    kafka:
      consumer:
        bootstrap-servers: localhost:9092
        group-id: groupid-Dev
        enable-auto-commit: false
        auto-offset-reset: latest
        auto.commit.interval.ms: 1000
        key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
        value-deserializer: org.apache.kafka.common.serialization.StringDeserialize

Issue:

Every time when I launch a job, it seeks 0th Offset. So, I am getting messages from beginning. Is this a bug?
Why do we need to manually supply partitions to read from? What is it changes in future, wouldn't it affect my code?

回答1:

Every time when I launch a job, it seeks 0th Offset. So, I am getting messages from beginning. Is this a bug?

No, this is a feature (seriously) :-) The choice of making the kafka item reader reads from the beginning of the partition is to make it consistent with other readers (they all start from the beginning of the datasource). But in the world of Kafka where the offset is a first order concept, we will make the starting offset configurable (we have a PR for this). This will be shipped in the upcoming v4.3 planned for October 2020.

Why do we need to manually supply partitions to read from?

Because Spring Batch cannot make the decision of what partition to read from for a given topic name. We are open for suggestions about a reasonable default here.

回答2:

Answer to 1st question:

You set enable-auto-commit: false. In this case you must commit offsets manually or you can set enable-auto-commit to true. Otherwise because you don't commit offsets, your current offset will always be zero.

Answer to 2nd question:

You don't have to manually supply partitions to read from. You can just set topic to subscribe, then Kafka will assign partitions of this topic to consumers in the same consumer-group evenly.

来源：https://stackoverflow.com/questions/59853711/spring-batch-kafka-kafkaitemreader-reads-the-data-always-from-beginning

标签

java

Spring

apache-kafka

spring-batch

spring-kafka