Lets say I have multiple devices. Each device has different type of sensors. Now I want to send the data from each device for each sensor to kafka. But I am confused about t
It depends on your semantics:
For example, if you have different type of sensors collecting different data, you should use a topic for each type.
Since devices can be added or removed also sensors can be added or removed. Is there a way to create these topics and partition on the fly.
If device meta data (to distinguish where date comes from) is embedded in each message, you should use a single topic with many partitions to scale out. Adding new topics or partitions is possible but must be done manually. For adding new partitions, a problem might be that it might change your data distribution and thus might break semantics. Thus, best practice is to over partition your topic from the beginning on to avoid adding new partitions.
If there is no embedded meta data, you would need multiple topics (eg, per user, or per device) to distinguish message origins.
As an alternative, maybe a single topic with multiple partitions and a fixed mapping from device/sensor to partition -- via using a custom partitioner -- would work, too. For this case, adding new partitions is no problem, as you control data distribution and can keep it stable.
Update
There is a blog post discussing this: https://www.confluent.io/blog/put-several-event-types-kafka-topic/
I would create topics based on sensors and partitions based on devices:
A sensor on Device 1 -> topic A, partition 1.
A sensor on Device 2 -> topic A, partition 2.
B sensor on Device 2 -> topic B, partition 2.
and so on.
I don't know what kind of sensors you have, but they seems to belong semantically to the same set of data. With the help of partitions you can have parallel processing.
But it depends on how you want to process you data: is it more important to process sensors together or devices?