optimization of sequential i/o operations on large file sizes

问题

Compiler:  Microsoft C++ 2005
Hardware:  AMD 64-bit (16 GB)

Sequential, read-only access from an 18GB file is committed with the following timing, file access, and file structure characteristics:

18,184,359,164 (file length)
11,240,476,672 (ntfs compressed file length)

Time    File         Method                                 Disk
14:33?  compressed   fstream                                fixed disk
14:06   normal       fstream                                fixed disk
12:22   normal       winapi                                 fixed disk
11:47   compressed   winapi                                 fixed disk
11:29   compressed   fstream                                ram disk
10:37   compressed   winapi                                 ram disk
 7:18   compressed   7z stored decompression to ntfs 12gb   ram disk
 6:37   normal       copy to same volume                    fixed disk

The fstream constructor and access:

define BUFFERSIZE 524288
    unsigned int mbytes = BUFFERSIZE;
    char * databuffer0; databuffer0 = (char*) malloc (mbytes);
    datafile.open("drv:/file.ext", ios::in | ios::binary );
    datafile.read (databuffer0, mbytes);

The winapi constructor and access:

define BUFFERSIZE 524288
    unsigned int mbytes = BUFFERSIZE;
    const TCHAR* const filex = _T("drv:/file.ext");
    char   ReadBuffer[BUFFERSIZE] = {0};
    hFile = CreateFile(filex, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if( FALSE == ReadFile(hFile, ReadBuffer, BUFFERSIZE-1, &dwBytesRead, NULL))
    { ...

For the fstream method, -> 16MB buffer sizes do not decrease processing time. All buffer sizes beyond .5MB fail for the winapi method. What methods would optimize this implementation versus processing time?

回答1:

Did you try memory-mapping the file? In my test this was always the fastest way to read large files.

Update: Here's an old, but still accurate description of memory mapped files: http://msdn.microsoft.com/en-us/library/ms810613.aspx

回答2:

Try this.

hf = CreateFile(..... FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED ...)

Then the reading loop. Minor details omitted as typing on iPad...

int bufsize =4*1024*1024;
CEvent e1;
CEvent e2;
CEvent e3;
CEvent e4;
unsigned char* pbuffer1 = malloc(bufsize);
unsigned char* pbuffer2 = malloc(bufsize);
unsigned char* pbuffer3 = malloc(bufsize);
unsigned char* pbuffer4 = malloc(bufsize);
int CurOffset = 0;

do {
   OVERLAPPED r1;
   memset(&r1, 0, sizeof(OVERLAPPED));
   r1.Offset = CurOffset;
   CurOffset += bufsize;
   r1.hEvent = e1;
   if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r1)) {
       // check for error AND error_handle_eof (important)
   }

   OVERLAPPED r2;
   memset(&r2, 0, sizeof(OVERLAPPED));
   r2.Offset = CurOffset;
   CurOffset += bufsize;
   r2.hEvent = e2;
   if (! ReadFile(hf, pbuffer2, bufsize, bufsize, &r2)) {
       // check for error AND error_handle_eof (important)
   }

   OVERLAPPED r3;
   memset(&r3, 0, sizeof(OVERLAPPED));
   r3.Offset = CurOffset;
   CurOffset += bufsize;
   r3.hEvent = e3;
   if (! ReadFile(hf, pbuffer3, bufsize, bufsize, &r3)) {
       // check for error AND error_handle_eof (important)
   }

   OVERLAPPED r4;
   memset(&r4, 0, sizeof(OVERLAPPED));
   r4.Offset = CurOffset;
   CurOffset += bufsize;
   r4.hEvent = e4;
   if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r4)) {
       // check for error AND error_handle_eof (important)
   }

   // wait for events to indicate data present
   // send data to consuming threads
   // allocate new buffer
} while ( not eof, etc )

The above is the bones of what you need. We use this and achieve high I/O throughput rates, but you will need to perhaps improve it slightly to achieve ultimate performance. We found 4 outstanding I/O was best for our use, but this will vary by platform. Reading less than 1Mb per IO was performance negative. Once you have the buffer read, don't ty and consume it in the reading loop, post to another thread, and allocate another buffer (but get them from a reuse queue, dont keep using malloc). The overall intent of the above is to try and keep 4 outstanding IO open to the disk, as soon as you don't have this, overall performance will drop.

Also, this works best on a disk that is only Reading your file. If you start reading/writing different files on the same disk at same time, performance drops quickly, unless you have SSD disks!

Not sure why your readfile is failing for 0.5Mb buffers, just double checked and our live prod code is using 4Mb buffers

来源：https://stackoverflow.com/questions/16100024/optimization-of-sequential-i-o-operations-on-large-file-sizes

标签

winapi

optimization

fstream

large-files