Performance optimization for processing of 115 million records for inserting into Oracle

后端未结

关注

 1  1895

I have a requirement where I am reading a text file placed in Unix of size 19 GB and having records around 115 million. My Spring Batch (Launcher) is getting triggered by Au

相关标签:

1条回答

慢半拍i

2021-01-15 18:21
In a project, I worked on, we had to transfer 5 billion records from db2 to oracle. With a quite complex transformation logic. During the transformation, the data was saved about 4 times in different files. We were able to insert data with about 50'000 records a row in an oracle db. From that point of view, doing it under 4 hours seems realistic.

You didn't state where exactly your bottlenecks are, but here are some ideas.
1. parallelisation - can you split up the file into chunks, which could be processid in parallel, for instance several instances of our job?
2. chunksize - we used a chunksize of 5000 to 10000 when writing to oracle
3. removing unnecessary data parsing, especially Date/Timestamp parsing - for instance, we had a lot of timestamps in our data, but they were not relevant for the processing logic. Since we had to read and write them from/to a file a couple of times during processing, we didn't parse, we just kept the string representation. Moreover, a lot of this timestamps had special values, like 1.1.0001 00:00:00.000000 or 31.12.9999 23.59.59.000000, we used LD or HD (for lowdate and highdate) to represent them.
HTH.
0 讨论(0)
发布评论:

提交评论
- 加载中...