Scalable in-order message processing on Azure serverless

问题

I need to create something on Azure that can process incoming streams of messages for a set of entities. We will have anywhere between 20 and 2,000 entities at any point in time; these get created and discarded dynamically. Messages will be generated using our on-premises system, and sent to Azure using some queueing mechanism. Each message will be associated with a specific entity through an EntityId property. Messages belonging to the same entity must be processed in-order with respect to each other.

At the same time, the solution must be scalable with respect to entities. If I have steady streams of messages for 1,000 entities, I'd want to have 1,000 concurrent executions of my logic. If an entity takes a long time to process one of its messages, this must not block any of the other entities from processing their messages. Each message may take anywhere from 100ms to 10s to process (vast majority below 1s), and each entity would receive an average of one message per second.

Disappointingly, the Azure serverless stack does not seem to have any means of achieving this. These are the options I've considered and their problems:

Azure Functions triggered by Service Bus queue with sessions. Azure Functions can be run as serverless on a consumption plan, making them perfect for elastic scaling. Service Bus sessions provide for in-order delivery, and are the closest implementation of my requirements. However, they are not supported in Azure Functions: Support Service Bus queues and topics which use sessions.
Logic Apps triggered by Service Bus queue with sessions: This is supported out-of-the-box through the "Correlated in-order delivery using service bus sessions" template. The Logic App can then hook to an HTTP-triggered Azure Function for processing messages. The Logic App's only purpose is to prevent multiple messages belonging to the same entity/session from being processed concurrently. However, from the comments to my former question, I found this would not be scalable either. A Logic App can only execute 300,000 actions per 5 minutes, has a trigger concurrency limit of 50, and is said to be expensive. See Limits and configuration information for Azure Logic Apps.
Azure Event Hubs with partitions, as discussed in In order event processing with Azure Functions. This tends to be the most popular option, and the one recommended by Microsoft. However, Event Hubs only permit up to 32 partitions, with the number of partitions needing to be specified at creation: Features and terminology in Azure Event Hubs. This limitation of a fixed static set of partitions goes against the spirit of "serverless"; if we're limiting our degree of parallelism to 32, then we're not getting any better scalability than a parallel application running on a 32-core machine. The partition limit can be increased beyond 32 via a support ticket to Microsoft, but I wouldn't want to ask for scalability that's two orders of magnitude beyond what's available for general use. Event Hubs also lack some other basic properties, such as at-most-once delivery.
We can dynamically create a Service Bus queue per entity, and have a singleton Azure Function spawned for it, bound exclusively to that specific queue. However, this would entail invoking the Azure resource management APIs as part of our operational code, and my impression is that Azure Functions weren't designed to be spawned dynamically this way.
Optimistic concurrency control against a persistent backing store, such as Redis, using the SequenceNumber property of the Service Bus queue messages for ordering. However, the programming model for this is quite complex and easy to get wrong – operations need to be performed in retry loops with consideration explicitly paid to atomicity and idempotency each time. Also, it requires all messages to contain the full entity state; otherwise, information would be lost when we discard stale messages in case of races. It would be expensive for us (in terms of computation and bandwidth) to send the full entity snapshot with each incremental change, so we'd rather find a means that would allow us to process incremental messages in-order instead.

Is there any clean way of achieving in-order processing for a scalable number of entities on the Azure serverless stack?

来源：https://stackoverflow.com/questions/53781371/scalable-in-order-message-processing-on-azure-serverless

标签

azure

azure-functions

azure-logic-apps

Serverless