Performance optimization for processing of 115 million records for inserting into Oracle

后端 未结 1 1875
执念已碎
执念已碎 2021-01-15 17:23

I have a requirement where I am reading a text file placed in Unix of size 19 GB and having records around 115 million. My Spring Batch (Launcher) is getting triggered by Au

1条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-15 18:21

    In a project, I worked on, we had to transfer 5 billion records from db2 to oracle. With a quite complex transformation logic. During the transformation, the data was saved about 4 times in different files. We were able to insert data with about 50'000 records a row in an oracle db. From that point of view, doing it under 4 hours seems realistic.

    You didn't state where exactly your bottlenecks are, but here are some ideas.

    1. parallelisation - can you split up the file into chunks, which could be processid in parallel, for instance several instances of our job?
    2. chunksize - we used a chunksize of 5000 to 10000 when writing to oracle
    3. removing unnecessary data parsing, especially Date/Timestamp parsing - for instance, we had a lot of timestamps in our data, but they were not relevant for the processing logic. Since we had to read and write them from/to a file a couple of times during processing, we didn't parse, we just kept the string representation. Moreover, a lot of this timestamps had special values, like 1.1.0001 00:00:00.000000 or 31.12.9999 23.59.59.000000, we used LD or HD (for lowdate and highdate) to represent them.

    HTH.

    0 讨论(0)
提交回复
热议问题