Should event hubs be split on message type?

此生再无相见时 提交于 2019-11-30 21:35:39

This is a question full of tradeoffs and exercising judgement about what systems you expect to build now and in the future and how they might use the different event types.

Below is an excerpt from some of the guidance Jay Kreps has given for designing systems on top of Apache Kafka which applies well to Event Hubs as well (with the major exception of the limitations imposed by short retention periods and limitations on number of consumer groups).

Let’s begin with pure event data—the activities taking place inside the company. In a web company these might be clicks, impression, and various user actions. FedEx might have package deliveries, package pick ups, driver positions, notifications, transfers and so on.

These type of events can be represented with a single logical stream per action type. For simplicity I recommend naming the Avro schema and the topic the same thing, e.g. PageViewEvent. If the event has a natural primary key you can use that to partition data in Kafka, otherwise the Kafka client will automatically load balance data for you.

...

We experimented at various times with mixing multiple events in a single topic and found this generally lead to undue complexity. Instead give each event it’s own topic and consumers can always subscribe to multiple such topics to get a mixed feed when they want that.

I generally agree with this advice (and you should read that entire blog post if you're designing a system on Event Hubs/Kafka/Kinesis). Subscribers needing to ignore messages they aren't interested in is not only annoying, it becomes problematic later if one of the event types starts to dominate the combined stream.

But having multiple streams and combining them together does have costs, and they need to be weighed in making a decision. I've listed some that come to mind.

  1. You lose ordering between events of different type from the same source unless you spend the effort to add it back.

  2. If you want to commit progress to the different topics together then you need to manage them.

  3. If you are partitioning the event streams on a primary key shared between the topics and want the partitions in each topic to travel together, you can't use the high level clients like EventProcessorHost as partitions can end up autobalanced to different processes.

  4. A consumer with one thread per partition ends up multiplying the needed number of threads by the number of topics. Probably not an issue unless you have expensive structures that can't be shared.

In my own deployment we use different event hubs for different event types even though we currently use the same code to process them all. This is simply because I expect to add new components that only care about certain event types. I hope this helps, and at worst I've told you to go look at the guidance for Kafka since the principle's the same and it's been around longer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!