Spring batch Multithreaded processing for Single file to Multiple FIle

前端 未结 3 1099
一个人的身影
一个人的身影 2021-01-19 19:38

My problem statement. Read a csv file with 10 million data and store it in db. with as minimal time as possible.

I had implemented it using Simple multi threaded

相关标签:
3条回答
  • 2021-01-19 20:15

    You can split your input file to many file , the use Partitionner and load small files with threads, but on error , you must restart all job after DB cleaned.

    <batch:job id="transformJob">
        <batch:step id="deleteDir" next="cleanDB">
            <batch:tasklet ref="fileDeletingTasklet" />
        </batch:step>
        <batch:step id="cleanDB" next="split">
            <batch:tasklet ref="countThreadTasklet" />
        </batch:step>
        <batch:step id="split" next="partitionerMasterImporter">
            <batch:tasklet>
                <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
            </batch:tasklet>
        </batch:step>
        <batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
            <partition step="importChunked" partitioner="filePartitioner">
                <handler grid-size="10" task-executor="taskExecutor" />
            </partition>
        </batch:step>
    </batch:job>
    

    Full example code (on Github).

    Hope this help.

    0 讨论(0)
  • 2021-01-19 20:20
    1. About multi-thread read How to set up multi-threading in Spring Batch? answer; it will point you to right direction. Also, in this sample there are some consideration about restart for CSV file
    2. Job should automatically fails if some error on thread: I have never tried, but this should be the default behaviour
    3. Spring Batch How to set time interval between each call in a Chunk tasklet can be a start. Also, official doc about Backoff Policies - When retrying after a transient failure it often helps to wait a bit before trying again, because usually the failure is caused by some problem that will only be resolved by waiting. If a RetryCallback fails, the RetryTemplate can pause execution according to the BackoffPolicy in place.

    Let me known if this help or how you solve problem because I'm interested for my (future) work!
    I hope my indications can be helpful.

    0 讨论(0)
  • 2021-01-19 20:25

    Here is how I solved the problem.

    1. Read a file and chunk the file( split the file) using Buffered and File Channel reader and writer ( the fastest way of File read/write, even spring batch uses the same). I implemented such that this is executed before job is started( However it can be executed using job as step using method invoker)

    2. Start the Job with directory location as job parameter.

    3. Use multiResourcePartitioner which will get the directory location and for each file a slave step is created in separate thread
    4. In the Slave step get the file passed from Partitioner and use spring batchs itemreader to read the file
    5. Use the Database item writer( I'm using mybatis batch itemwriter) to push the data to Database.
    6. Its better to use the split count equal to commit-count of step.
    0 讨论(0)
提交回复
热议问题