问题
Azure event hub has partition feature for scalability. While reading data using app service, one eventprocessorHost can be tied to one partition only. There is no way to act collectively on data coming from multiple partitions. But while using Stream analytics, we can aggregate data based on time. So, does it take care of all the partitions while aggregating the data? Means, if reading are passed to 8 partitions, aggregate should includes all these readings in calculation. Thanks
回答1:
Yes. Based on the documentation there are a couple of scenario's.
When the output does support partitioning as well, like another Event Hub, you can use the Partition By:
you must make sure that your query is partitioned. This requires you to use Partition By in all the steps. Multiple steps are allowed, but they all must be partitioned by the same key. Currently, the partitioning key must be set to PartitionId in order for the job to be fully parallel.
When the output does not have support for partitioning (like Power BI) data is read without taking in the origin partition data (and so it will read from all partitions).
回答2:
If you don't use partition by partitionid, data from all input partitions will be merged before the aggregation. Ordering of events will be based on timestamp (either arrival or application). This does mean that lack of data in one partition can block the result, amount of time to block is controlled by late arrival window.
[This page] (https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-out-of-order-and-late-events) has additional details about late arrival window with examples.
来源:https://stackoverflow.com/questions/46129842/does-azure-stream-analytics-read-data-coming-from-all-partitions