问题
I have two Kafka (2.11-0.11.0.1) brokers. Default replication factor of topics is set to 2. Producers write data only to zero partition.
And I have scheduled executor which runs the task periodically. When it consumes a topic with a small number of records per minute (100 per minute) then in works like a charm. But for huge topics (10K per minute) method poll returns no data.
The task is:
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public final class TopicToDbPump implements Runnable {
private static final Logger log = LoggerFactory.getLogger(TopicToDbPump.class);
private final String topic;
private final TopicPartition topicPartition;
private final Properties properties;
public TopicToDbPump(String topic, Properties properties) {
this.topic = topic;
topicPartition = new TopicPartition(topic, 0);
this.properties = properties;
}
@Override
public void run() {
try (final Consumer<String, String> consumer = new KafkaConsumer<>(properties)) {
consumer.assign(Collections.singleton(topicPartition));
final long offset = readOffsetFromDb(topic);
consumer.seek(topicPartition, offset);
final ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
if (records.isEmpty()) {
log.debug("No data from topic " + topic + " available");
return;
}
saveData(records.records(topic));
} catch (Throwable t) {
log.error("Etl process " + topic + " failed with exception", t);
}
}
}
Parameters of consumers are:
"bootstrap.servers" = "host-1:9092,host-2:9092",
"group.id" = "my-group",
"enable.auto.commit" = "false",
"key.deserializer" = "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer" = "org.apache.kafka.common.serialization.StringDeserializer",
"max.partition.fetch.bytes": "50000000",
"max.poll.records": "10000"
What's wrong?
回答1:
The Kafka Consumer API does not guarantee that the first call to poll()
will return any data.
The Consumer first has to connect to the cluster, discover leaders for all partitions it's assigned to. As you imagine this can take a few seconds so it's unlikely data will have arrived immediately.
You should instead call poll()
several times if no data is returned first.
来源:https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data