I have created an application that does the following:
Each file is ~212k, so over all i have ~300Gb of data. It looks like the entire process takes ~40 days ...a ll the calculations are serial (each calculation is dependent on the one before), so i can't parallel this process to different CPUs or PCs. ... pretty sure the most of the overhead goes to file system access ... Every time i access a file i open a handle to it and then close it once i finish reading the data.
Writing data 300GB of data serially might take 40 minutes, only a tiny fraction of 40 days. Disk write performance shouldn't be an issue here.
Your idea of opening the file only once is spot-on. Probably closing the file after every operation is causing your processing to block until the disk has completely written out all the data, negating the benefits of disk caching.
My bet is the fastest implementation of this application will use a memory-mapped file, all modern operating systems have this capability. It can end up being the simplest code, too. You'll need a 64-bit processor and operating system, you should not need 300GB of RAM. Map the whole file into address space at one time and just read and write your data with pointers.