Please can anyone tell me how to read messages using the Kafka Consumer API from the beginning every time when I run the consumer.
Another option is to leave your Consumer code simple and steer the offset management from outside using the command line tool kafka-consumer-groups
that comes with Kafka.
Each time, before starting the consumer, you would call
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--execute --reset-offsets \
--group myConsumerGroup \
--topic myTopic \
--to-earliest
Depending on your requirement you can reset the offsets for each partition of the topic with that tool. The help function or documentation explain the options:
--reset-offsets also has following scenarios to choose from (atleast one scenario must be selected):
--to-datetime <String: datetime> : Reset offsets to offsets from datetime. Format: 'YYYY-MM-DDTHH:mm:SS.sss'
--to-earliest : Reset offsets to earliest offset.
--to-latest : Reset offsets to latest offset.
--shift-by <Long: number-of-offsets> : Reset offsets shifting current offset by 'n', where 'n' can be positive or negative.
--from-file : Reset offsets to values defined in CSV file.
--to-current : Resets offsets to current offset.
--by-duration <String: duration> : Reset offsets to offset by duration from current timestamp. Format: 'PnDTnHnMnS'
--to-offset : Reset offsets to a specific offset.
One option to do this would be to have a unique group id each time you start which will mean that Kafka would send you the messages in the topic from the beginning. Do something like this when you set your properties for KafkaConsumer
:
properties.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());
The other option is to use consumer.seekToBeginning(consumer.assignment())
but this will not work unless Kafka first gets a heartbeat from your consumer by making the consumer call the poll method. So call poll()
, then do a seekToBeginning()
and then again call poll()
if you want all the records from the start. It's a little hackey but this seems to be the most reliable way to do it as of the 0.9 release.
// At this point, there is no heartbeat from consumer so seekToBeinning() wont work
// So call poll()
consumer.poll(0);
// Now there is heartbeat and consumer is "alive"
consumer.seekToBeginning(consumer.assignment());
// Now consume
ConsumerRecords<String, String> records = consumer.poll(0);
while using the High Level consumer set props.put("auto.offset.reset", "smallest");
in times of creating the ConsumerConfig
1) https://stackoverflow.com/a/17084401/3821653
2) http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3CCAOG_4QYz2ynH45a8kXb8qw7xw4vDRRwNqMn5j9ERFxJ8RfKGCg@mail.gmail.com%3E
To reset the consumer group, you can delete the Zookeeper group id
import kafka.utils.ZkUtils;
ZkUtils.maybeDeletePath(<zkhost:zkport>, </consumers/group.id>);`