I\'ve got a moderately big set of data, about 800 MB or so, that is basically some big precomputed table that I need to speed some computation by several orders of magnitude (cr
For the write case, on Java 7, AsynchronousFileChannel should be looked at.
When performing random record-oriented writes across large files (exceeding physical memory so caching isn't helping everything) on NTFS, I find that AsynchronousFileChannel performs over twice as many operations, in single-threaded mode, versus a normal FileChannel (on a 10GB file, 160 byte records, completely random writes, some random content, several hundred iterations of benchmarking loop to achieve steady state, roughly 5,300 writes per second).
My best guess is that because the asynchronous io boils down to overlapped IO in Windows 7, the NTFS file system driver is able to update its own internal structures faster when it doesn't have to create a sync point after every call.
I micro-benchmarked against RandomAccessFile to see how it would perform (results are very close to FileChannel, and still half of the performance of AsynchronousFileChannel.
Not sure what happens with multi-threaded writes. This is on Java 7, on an SSD (the SSD is an order of magnitude faster than magnetic, and another order of magnitude faster on smaller files that fit in memory).
Will be interesting to see if the same ratios hold on Linux.