问题
In Mule, I have quite many records to process, where processing includes some calculations, going back and forth to database etc.. We can process collections of records with these options
- Batch processing
- ForEach
Splitter-Aggregator
So what are the main differences between them? When should we prefer one to others?
Mule batch processing option does not seem to have batch job scope variable definition, for example. Or, what if I want to benefit multithreading to fasten the overall task? Or, which is better if I want to modify the payload during processing?
回答1:
When you write "quite many" I assume it's too much for main memory, this rules out spliter/aggregator because it has to collect all records to return them as a list.
I assume you have your records in a stream or iterator, otherwise you probably have a memory problem...
So when to use for-each and when to use batch?
For Each
The most simple solution, but it has some drawbacks:
- It is single threaded (so may be too slow for your use case)
- It is "fire and forget": You can't collect anything within the loop, e.g. a record count
- There is not support handling "broken" records
Within the loop, you can have several steps (message processors) to process your records (e.g. for the mentioned database lookup).
May be a drawback, may be an advantage: The loop is synchronous. (If you want to process asynchronous, wrap it in an async-scope.)
Batch
A little more stuff to do / to understand, but more features:
- When called from a flow, always asynchronous (this may be a drawback).
- Can be standalone (e.g. with a poll inside for starting)
- When the data generated in the loading phase is too big, it is automatically offloaded to disk.
- Multithreading for free (number of threads configurable)
- Handling for "broken records": Batch steps may be executed for good/broken records only.
- You get statitstics at the end (number of records, number of successful records etc.)
So it looks like you better use batch.
回答2:
For Splitter and Aggregator , you are responsible for writing the splitting logic and then joining them back at the end of processing. It is useful when you want to process records asynchronously using different server. It is less reliable compared to other option, here parallel processing is possible.
Foreach is more reliable but it process records iteratively using single thread ( synchronous), hence parallel processing is not possible. Each records creates a single message by default.
Batch processing is designed to process millions of records in a very fast and reliable way. By default 16 threads will process your records and it is reliable as well.
Please go through the link below for more details.
https://docs.mulesoft.com/mule-user-guide/v/3.8/splitter-flow-control-reference
https://docs.mulesoft.com/mule-user-guide/v/3.8/foreach
回答3:
I have been using approach to pass on records in array to stored procedure. You can call stored procedure inside for loop and setting batch size of the for loop accordingly to avoid round trips. I have used this approach and performance is good. You may have to create another table to log results and have that logic in stored procedure as well.
Below is the link which has all the details https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro
来源:https://stackoverflow.com/questions/43413958/mule-batch-processing-vs-foreach-vs-splitter-aggregator