What is the bestway to create topics in kafka?
In the new producer API, when i try
When you are starting your Kafka broker you can define set of properties in conf/server.properties
file. This file is just key value property file. One of the properties is auto.create.topics.enable
, if it's set to true (by default) Kafka will create topics automatically when you send messages to non-existing topics.
All config options you can find are defined here. IMHO, a simple rule for creating topics is the following: number of replicas cannot be more than the number of nodes that you have. Number of topics and partitions is unaffected by the number of nodes in your cluster
for example:
I'd like to share my recent experience I described on my blog The Side Effect of Fetching Kafka Topic Metadata and also give my answers to certain questions brought up here.
1) What is the best way to create topics in kafka? Do we need to create topic prior to publish messages?
I think if we know we are going to use a fixed name Kafka topic in advance, we would be better off to create the topic before we write or read messages from it. This typically can be done in a post startup script by using bin/kafka-topics.sh see the official documentation for example. Or we can use KafkaAdminClient which was introduced in Kafka 0.11.0.0.
On the other hand, I do see certain cases where we would need to generate a topic name on the fly. In these cases, we wouldn't be able to know the fixed topic name and we can rely on the "auto.create.topics.enable" property. When it is enabled, a topic would be created automatically. And this brings up the second question:
2) Which actions would cause the creation when auto.create.topics.enable is true
Actually as @Lan already pointed out
If this is set to true, when applications attempt to produce, consume, or fetch metadata for a non-existent topic, Kafka will automatically create the topic with the default replication factor and number of partitions.
I would like to put it even simpler:
If auto topic creation is enabled for Kafka brokers, whenever a Kafka broker sees a specific topic name, that topic will be created if it does not already exist
And also the fact that fetch metadata would automatically create the topic is often overlooked by people including myself. A specific example for this is to use the consumer.partitionFor(topic) API, this method would create the given topic if it does not exist.
For anyone who is interested in more details I mentioned above, you can take a look at my own blog post on this same topic too The Side Effect of Fetching Kafka Topic Metadata.
set the property
auto.create.topics.enable=true
in your server.properties file, if you have multiple brokers do thee same for all the server*.properties file and restart your kafka-server.
But make sure you set the partitions for an appropriate number in the server*.properties num.partitions=int
, otherwise there will be a performance issue if you increase the partitions later.
The basic level of parallelism in Kafka is the partition. On both the producer and the broker side, writes to different partitions can be done fully in parallel.
Things to keep in mind
As a rule of thumb, it’s probably a good idea to limit the number of partitions per broker to 100 x b x r
,
where b
is the number of brokers and r
is the replication factor.
For example: If you have 9 brokers/nodes in your cluster your topic could have
EDIT: See the article How to choose the number of topics/partitions in a Kafka cluster? for further details (answer has been taken from there)
You can create a topic programmatically .
public class CreateTopic {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties config = new Properties();
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
AdminClient admin = AdminClient.create(config);
//creating new topic
System.out.println("-- creating --");
NewTopic newTopic = new NewTopic("my-new-topic", 1, (short) 1);
admin.createTopics(Collections.singleton(newTopic));
//listing
System.out.println("-- listing --");
admin.listTopics().names().get().forEach(System.out::println);
}
}
Partition number determines the parallelism of the topic since one partition can only be consumed by one consumer in a consumer group. For example, if you only have 10 partitions for a topic and 20 consumers in a consumer group, 10 consumers are idle, not receiving any messages. The number really depends on your application, but 1-1000s are all reasonable.
Replica number is determined by your durability requirement. For a topic with replication factor N, Kafka can tolerate up to N-1 server failures without losing any messages committed to the log. 3 replicas are common configuration. Of course, the replica number has to be smaller or equals to your broker number.
auto.create.topics.enable property controls when Kafka enables auto creation of topic on the server. If this is set to true, when applications attempt to produce, consume, or fetch metadata for a non-existent topic, Kafka will automatically create the topic with the default replication factor and number of partitions. I would recommend turning it off in production and creating topics in advance.