问题
we need to setup 4 EventHub and 3 Azure Functions. So what is the best way to have high throughput and Scalable parameters that we can set to have a system that can handle 75k message/sec?
- Local.settings.json
- hosts.json
- prefetch Count
- max batch side
回答1:
This article is definitely worth a read and is something I based some of my work on, I needed to achieve 50k p/sec. https://azure.microsoft.com/en-gb/blog/processing-100-000-events-per-second-on-azure-functions/
An important consideration is how many partitions you will have, as this will directly impact your total throughput. As you scale out instance of your application, the Event Processor Host (EPH) will try and take ownership of processing a particular partition, and each partition can process 1MB/sec ingress and 2MB/sec egress. (or, 1000 events p/sec)
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-faq
You need to consider both message size and message counts. If possible, cram as many data points as possible into an event hub message. In my scenario, I'm processing 500 data points in each event hub message - it's much more efficient to extract lots of data from a single message rather than a small amount of data from lots of messages.
For your throughput requirements, this is something you need to consider. Even at 32 partitions, that's not going to give you 75k msg p/sec - you can ask Microsoft to increase the partition count, as they did in the original article I linked, where they have 100 partitions.
As for configuration settings : I'm running with
{
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 10,
"eventProcessorOptions": {
"maxBatchSize": 256,
"prefetchCount": 512,
"enableReceiverRuntimeMetric": true
}
}
}
}
- I receive a batch of messages, up to 256
- Each message can contain up to 500 data points
- We checkpoint a partition after 10 batches
This means there's up to approx 1.3million data points that could be processed again, in an event that causes the functions to have to begin processing from the last known checkpoint. This is also important - are your updates idempotent, or doesn't matter if they are reprocessed?
You are going to need to put the data from the messages into some sort of data store, and you're going to be inserting at a high rate into that - can your target data store cope with inserts at this high frequency? What happens to your processing pipeline if your target store has an outage? I went with a similar approach as described in this article, which is summarized as 'in the event of any failure when processing a batch of messages, move the entire batch onto an 'errors' hub and let another function try and process them'. You can't stop processing at this volume or you will fall behind!
https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
That's also an important point. How real-time does your processing need to be? If you start falling behind, would you need to scale out to try and catch up? How would you know if this was happening? I created a metric to track how far behind the latest event any partition is, which allows me to visualize and set up alerts on - I also scale out my functions based on this number.
https://medium.com/@dylanm_asos/azure-functions-event-hub-processing-8a3f39d2cd0f
At the volumes you've mentioned - it's not just some configuration that will let you achieve it, there are a number of considerations
回答2:
- If you're willing to write a lot more code instead of using Azure Functions then writing your own application using EventHub SDK can not be beaten easily when it comes to throughput and flexibility of functionality.
- A great blog Azure Functions and Event Hubs: Optimising for Throughput. Here is my summary of it's summary (which is also at end).
- Publish in batches if possible.
- Keep partition count high.
- Keep
maxBatchSize
high as possible. (remember this is just a suggestion to Function Runtime, there are too many variables and you may not get big enough batches even if you setmaxBatchSize
to a big number) - Use dedicated plan instead of consumption.
- Write efficient/fast code for your function.
Event Publishers
- Write to EH using batches (mind the size limit!). Btw, this batch size has nothing to do with maxBatchSize
- Use AMQP for efficiency
- If reporting Application Time, use UTC
- If using partition affinity, avoid creating hot partitions by choosing a bad partition key, this will create a skew on the processing side. If your scenario does not require FIFO or in-order processing (which can only be achieved within a single partition), do not specify the partition id at all for round-robin writes. Some more reading here
Event Hub
- Choose the number of partitions appropriately, since it defines the number of parallel consumers. More details here
- For high-throughput scenarios consider Azure Event Hubs Dedicated
- When working out how many Throughput Units you require, consider both the ingress and the egress sides. Multiple consumer groups will compete for egress throughput
- If you enable Event Hub Capture you can use the AVRO files landing on Blob Storage to trigger your cold path / batch processing, it’s a supported trigger too
Event Hub Trigger Settings: host.json and function.json
- Explicitly set “cardinality” to “many” in the function.json to enable batching of messages
- maxBatchSize in host.json: the default setting of 64 may not be sufficient for you pipeline, adjust, measure and adjust again. Keep in mind that editing host.json will restart your Azure Function
- prefetchCount in host.json: the meaning of this setting is “how many messages to fetch and cache before feeding them in batches of maxBatchSize to the Function. I usually set it explicitly to 2*maxBatchSize. By the way, setting it to any value below maxBatchSize will have a negative impact on performance by reducing the batch size
- batchCheckpointFrequency in host.json: have a look at the storage account associated with your Azure Function and you will see how checkpoints are stored as tiny json files per partition per consumer group. The default setting of 1 tells the Azure Function to create a checkpoint after successfully processing every batch. A batch will be considered successfully processed if your code runs successfully (you’re still responsible for catching exceptions). I usually start with the default value of 1 and increase this value a bit when I see throttling events on the storage account associated with the Function (things can get especially nasty when multiple Azure Functions share one storage account). The downside of increasing batchCheckpointFrequency is that, in case of a crash, your Function will have to replay more messages since the last checkpoint
Azure Function
- Make sure your code is written to process events in variable-size batches
- Use non-blocking async code
- Enable Application Insights but carefully assess the amount of telemetry you require and tweak the aggregation and sampling settings in host.json accordingly
- Disable built-in logging by deleting the AzureWebJobsDashboard app setting. By default, the Azure Function logs to Blob Storage and under high workloads you may lose telemetry due to throttling
- Consumption Plan may not always be a good fit from a performance perspective, consider using Premium App Service Plans or instead deploying the Event Processor Host on appropriately sized VMs/containers
- When dealing with Azure Event Hubs, there’s no concept of “locking”, “deadlettering”, etc. Make sure to handle exceptions at an individual message level. Great write-up on the subject is here
来源:https://stackoverflow.com/questions/64358671/what-is-the-best-parameters-to-integrate-azure-functions-with-event-hubs-chain