Is there a way to purge the topic in kafka?
I pushed a message that was too big into a kafka message topic on my local machine, now I\'m getting an
Thomas' advice is great but unfortunately zkCli
in old versions of Zookeeper (for example 3.3.6) do not seem to support rmr
. For example compare the command line implementation in modern Zookeeper with version 3.3.
If you are faced with an old version of Zookeeper one solution is to use a client library such as zc.zk for Python. For people not familiar with Python you need to install it using pip or easy_install. Then start a Python shell (python
) and you can do:
import zc.zk
zk = zc.zk.ZooKeeper('localhost:2181')
zk.delete_recursive('brokers/MyTopic')
or even
zk.delete_recursive('brokers')
if you want to remove all the topics from Kafka.
From Java, using the new AdminZkClient
instead of the deprecated AdminUtils
:
public void reset() {
try (KafkaZkClient zkClient = KafkaZkClient.apply("localhost:2181", false, 200_000,
5000, 10, Time.SYSTEM, "metricGroup", "metricType")) {
for (Map.Entry<String, List<PartitionInfo>> entry : listTopics().entrySet()) {
deleteTopic(entry.getKey(), zkClient);
}
}
}
private void deleteTopic(String topic, KafkaZkClient zkClient) {
// skip Kafka internal topic
if (topic.startsWith("__")) {
return;
}
System.out.println("Resetting Topic: " + topic);
AdminZkClient adminZkClient = new AdminZkClient(zkClient);
adminZkClient.deleteTopic(topic);
// deletions are not instantaneous
boolean success = false;
int maxMs = 5_000;
while (maxMs > 0 && !success) {
try {
maxMs -= 100;
adminZkClient.createTopic(topic, 1, 1, new Properties(), null);
success = true;
} catch (TopicExistsException ignored) {
}
}
if (!success) {
Assert.fail("failed to create " + topic);
}
}
private Map<String, List<PartitionInfo>> listTopics() {
Properties props = new Properties();
props.put("bootstrap.servers", kafkaContainer.getBootstrapServers());
props.put("group.id", "test-container-consumer-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
Map<String, List<PartitionInfo>> topics = consumer.listTopics();
consumer.close();
return topics;
}
The simplest approach is to set the date of the individual log files to be older than the retention period. Then the broker should clean them up and remove them for you within a few seconds. This offers several advantages:
In my experience with Kafka 0.7.x, removing the log files and restarting the broker could lead to invalid offset exceptions for certain consumers. This would happen because the broker restarts the offsets at zero (in the absence of any existing log files), and a consumer that was previously consuming from the topic would reconnect to request a specific [once valid] offset. If this offset happens to fall outside the bounds of the new topic logs, then no harm and the consumer resumes at either the beginning or the end. But, if the offset falls within the bounds of the new topic logs, the broker attempts to fetch the message set but fails because the offset doesn't align to an actual message.
This could be mitigated by also clearing the consumer offsets in zookeeper for that topic. But if you don't need a virgin topic and just want to remove the existing contents, then simply 'touch'-ing a few topic logs is far easier and more reliable, than stopping brokers, deleting topic logs, and clearing certain zookeeper nodes.
Temporarily update the retention time on the topic to one second:
kafka-topics.sh --zookeeper <zkhost>:2181 --alter --topic <topic name> --config retention.ms=1000
And in newer Kafka releases, you can also do it with kafka-configs --entity-type topics
kafka-configs.sh --zookeeper <zkhost>:2181 --entity-type topics --alter --entity-name <topic name> --add-config retention.ms=1000
then wait for the purge to take effect (about one minute). Once purged, restore the previous retention.ms
value.
Could not add as comment because of size: Not sure if this is true, besides updating retention.ms and retention.bytes, but I noticed topic cleanup policy should be "delete" (default), if "compact", it is going to hold on to messages longer, i.e., if it is "compact", you have to specify delete.retention.ms also.
./bin/kafka-configs.sh --zookeeper localhost:2181 --describe --entity-name test-topic-3-100 --entity-type topics
Configs for topics:test-topic-3-100 are retention.ms=1000,delete.retention.ms=10000,cleanup.policy=delete,retention.bytes=1
Also had to monitor earliest/latest offsets should be same to confirm this successfully happened, can also check the du -h /tmp/kafka-logs/test-topic-3-100-*
./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list "BROKER:9095" --topic test-topic-3-100 --time -1 | awk -F ":" '{sum += $3} END {print sum}'
26599762
./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list "BROKER:9095" --topic test-topic-3-100 --time -2 | awk -F ":" '{sum += $3} END {print sum}'
26599762
The other problem is, you have to get current config first so you remember to revert after deletion is successful:
./bin/kafka-configs.sh --zookeeper localhost:2181 --describe --entity-name test-topic-3-100 --entity-type topics
Here are the steps I follow to delete a topic named MyTopic
:
rm -rf /tmp/kafka-logs/MyTopic-0
. Repeat for other partitions, and all replicaszkCli.sh
then rmr /brokers/MyTopic
If you miss you step 3, then Apache Kafka will continue to report the topic as present (for example when if you run kafka-list-topic.sh
).
Tested with Apache Kafka 0.8.0.