Many small files or one big file? (Or, Overhead of opening and closing file handles) (C++)

后端 未结 6 631
春和景丽
春和景丽 2021-02-01 21:22

I have created an application that does the following:

  1. Make some calculations, write calculated data to a file - repeat for 500,000 times (over al
6条回答
  •  梦毁少年i
    2021-02-01 21:32

    Each file is ~212k, so over all i have ~300Gb of data. It looks like the entire process takes ~40 days ...a ll the calculations are serial (each calculation is dependent on the one before), so i can't parallel this process to different CPUs or PCs. ... pretty sure the most of the overhead goes to file system access ... Every time i access a file i open a handle to it and then close it once i finish reading the data.

    Writing data 300GB of data serially might take 40 minutes, only a tiny fraction of 40 days. Disk write performance shouldn't be an issue here.

    Your idea of opening the file only once is spot-on. Probably closing the file after every operation is causing your processing to block until the disk has completely written out all the data, negating the benefits of disk caching.

    My bet is the fastest implementation of this application will use a memory-mapped file, all modern operating systems have this capability. It can end up being the simplest code, too. You'll need a 64-bit processor and operating system, you should not need 300GB of RAM. Map the whole file into address space at one time and just read and write your data with pointers.

提交回复
热议问题