How to delete data which already been consumed by consumer? Kafka

此生再无相见时 提交于 2021-01-28 03:51:33

问题


I am doing data replication in kafka. But, the size of kafka log file is increases very quickly. The size reaches 5 gb in a day. As a solution of this problem, ı want to delete processed data immediately. I am using delete record method in AdminClient to delete offset. But when I look at the log file, data corresponding to that offset is not deleted.

RecordsToDelete recordsToDelete = RedcordsToDelete.beforeOffset(offset);
TopicPartition topicPartition = new TopicPartition(topicName,partition);
Map<TopicPartition,RecordsToDelete> deleteConf = new HashMap<>();
deleteConf.put(topicPartition,recordsToDelete);
adminClient.deleteRecords(deleteConf);

I don't want suggestions like (log.retention.hours , log.retention.bytes , log.segment.bytes , log.cleanup.policy=delete)

Because I just want to delete data consumed by the consumer. In this solution, I also deleted the data that is not consumed.

What are your suggestions?


回答1:


You didn't do anything wrong. The code you provided works and I've tested it. Just in case I've overlooked something in your code, mine is:

public void deleteMessages(String topicName, int partitionIndex, int beforeIndex) {
    TopicPartition topicPartition = new TopicPartition(topicName, partitionIndex);
    Map<TopicPartition, RecordsToDelete> deleteMap = new HashMap<>();
    deleteMap.put(topicPartition, RecordsToDelete.beforeOffset(beforeIndex));
    kafkaAdminClient.deleteRecords(deleteMap);
}

I've used group: 'org.apache.kafka', name: 'kafka-clients', version: '2.0.0'

So check if you are targeting right partition ( 0 for the first one)

Check your broker version: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html says:

This operation is supported by brokers with version 0.11.0.0

Produce the messages from the same application, to be sure you're connected properly.

There is one more option you can consider. Using cleanup.policy=compact If your message keys are repeating you could benefit from it. Not just because older messages for that key will be automatically deleted but you can use the fact that message with null payload deletes all the messages for that key. Just don't forget to set delete.retention.ms and min.compaction.lag.ms to values small enough. In that case you can consume a message and than produce null payload for the same key ( but be cautious with this approach since this way you can delete messages ( with that key) you didn't consume)




回答2:


Try this

DeleteRecordsResult result = adminClient.deleteRecords(recordsToDelete);
Map<TopicPartition, KafkaFuture<DeletedRecords>> lowWatermarks = result.lowWatermarks();
try {
    for (Map.Entry<TopicPartition, KafkaFuture<DeletedRecords>> entry : lowWatermarks.entrySet()) {
        System.out.println(entry.getKey().topic() + " " + entry.getKey().partition() + " " + entry.getValue().get().lowWatermark());
    }
} catch (InterruptedException | ExecutionException e) {
    e.printStackTrace();
}
adminClient.close();

In this code, you need to call entry.getValue().get().lowWatermark(), because adminClient.deleteRecords(recordsToDelete) returns a map of Futures, you need to wait for the Future to run by calling get()



来源:https://stackoverflow.com/questions/52961751/how-to-delete-data-which-already-been-consumed-by-consumer-kafka

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!