Kafka topic per producer

前端 未结 2 1707
盖世英雄少女心
盖世英雄少女心 2021-01-01 05:40

Lets say I have multiple devices. Each device has different type of sensors. Now I want to send the data from each device for each sensor to kafka. But I am confused about t

相关标签:
2条回答
  • 2021-01-01 05:48

    It depends on your semantics:

    • a topic is a logical abstraction and should contain "unify" data, ie, data with the same semantical meaning
    • a topic can easily be scaled out via its number of partitions

    For example, if you have different type of sensors collecting different data, you should use a topic for each type.

    Since devices can be added or removed also sensors can be added or removed. Is there a way to create these topics and partition on the fly.

    If device meta data (to distinguish where date comes from) is embedded in each message, you should use a single topic with many partitions to scale out. Adding new topics or partitions is possible but must be done manually. For adding new partitions, a problem might be that it might change your data distribution and thus might break semantics. Thus, best practice is to over partition your topic from the beginning on to avoid adding new partitions.

    If there is no embedded meta data, you would need multiple topics (eg, per user, or per device) to distinguish message origins.

    As an alternative, maybe a single topic with multiple partitions and a fixed mapping from device/sensor to partition -- via using a custom partitioner -- would work, too. For this case, adding new partitions is no problem, as you control data distribution and can keep it stable.

    Update

    There is a blog post discussing this: https://www.confluent.io/blog/put-several-event-types-kafka-topic/

    0 讨论(0)
  • 2021-01-01 06:01

    I would create topics based on sensors and partitions based on devices:

    A sensor on Device 1 -> topic A, partition 1.
    A sensor on Device 2 -> topic A, partition 2.
    B sensor on Device 2 -> topic B, partition 2.
    

    and so on.

    I don't know what kind of sensors you have, but they seems to belong semantically to the same set of data. With the help of partitions you can have parallel processing.

    But it depends on how you want to process you data: is it more important to process sensors together or devices?

    0 讨论(0)
提交回复
热议问题