问题
Compiler: Microsoft C++ 2005
Hardware: AMD 64-bit (16 GB)
Sequential, read-only access from an 18GB file is committed with the following timing, file access, and file structure characteristics:
18,184,359,164 (file length)
11,240,476,672 (ntfs compressed file length)
Time File Method Disk 14:33? compressed fstream fixed disk 14:06 normal fstream fixed disk 12:22 normal winapi fixed disk 11:47 compressed winapi fixed disk 11:29 compressed fstream ram disk 10:37 compressed winapi ram disk 7:18 compressed 7z stored decompression to ntfs 12gb ram disk 6:37 normal copy to same volume fixed disk
The fstream constructor and access:
define BUFFERSIZE 524288 unsigned int mbytes = BUFFERSIZE; char * databuffer0; databuffer0 = (char*) malloc (mbytes); datafile.open("drv:/file.ext", ios::in | ios::binary ); datafile.read (databuffer0, mbytes);
The winapi constructor and access:
define BUFFERSIZE 524288 unsigned int mbytes = BUFFERSIZE; const TCHAR* const filex = _T("drv:/file.ext"); char ReadBuffer[BUFFERSIZE] = {0}; hFile = CreateFile(filex, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if( FALSE == ReadFile(hFile, ReadBuffer, BUFFERSIZE-1, &dwBytesRead, NULL)) { ...
For the fstream method, -> 16MB buffer sizes do not decrease processing time. All buffer sizes beyond .5MB fail for the winapi method. What methods would optimize this implementation versus processing time?
回答1:
Did you try memory-mapping the file? In my test this was always the fastest way to read large files.
Update: Here's an old, but still accurate description of memory mapped files: http://msdn.microsoft.com/en-us/library/ms810613.aspx
回答2:
Try this.
hf = CreateFile(..... FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED ...)
Then the reading loop. Minor details omitted as typing on iPad...
int bufsize =4*1024*1024;
CEvent e1;
CEvent e2;
CEvent e3;
CEvent e4;
unsigned char* pbuffer1 = malloc(bufsize);
unsigned char* pbuffer2 = malloc(bufsize);
unsigned char* pbuffer3 = malloc(bufsize);
unsigned char* pbuffer4 = malloc(bufsize);
int CurOffset = 0;
do {
OVERLAPPED r1;
memset(&r1, 0, sizeof(OVERLAPPED));
r1.Offset = CurOffset;
CurOffset += bufsize;
r1.hEvent = e1;
if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r1)) {
// check for error AND error_handle_eof (important)
}
OVERLAPPED r2;
memset(&r2, 0, sizeof(OVERLAPPED));
r2.Offset = CurOffset;
CurOffset += bufsize;
r2.hEvent = e2;
if (! ReadFile(hf, pbuffer2, bufsize, bufsize, &r2)) {
// check for error AND error_handle_eof (important)
}
OVERLAPPED r3;
memset(&r3, 0, sizeof(OVERLAPPED));
r3.Offset = CurOffset;
CurOffset += bufsize;
r3.hEvent = e3;
if (! ReadFile(hf, pbuffer3, bufsize, bufsize, &r3)) {
// check for error AND error_handle_eof (important)
}
OVERLAPPED r4;
memset(&r4, 0, sizeof(OVERLAPPED));
r4.Offset = CurOffset;
CurOffset += bufsize;
r4.hEvent = e4;
if (! ReadFile(hf, pbuffer1, bufsize, bufsize, &r4)) {
// check for error AND error_handle_eof (important)
}
// wait for events to indicate data present
// send data to consuming threads
// allocate new buffer
} while ( not eof, etc )
The above is the bones of what you need. We use this and achieve high I/O throughput rates, but you will need to perhaps improve it slightly to achieve ultimate performance. We found 4 outstanding I/O was best for our use, but this will vary by platform. Reading less than 1Mb per IO was performance negative. Once you have the buffer read, don't ty and consume it in the reading loop, post to another thread, and allocate another buffer (but get them from a reuse queue, dont keep using malloc). The overall intent of the above is to try and keep 4 outstanding IO open to the disk, as soon as you don't have this, overall performance will drop.
Also, this works best on a disk that is only Reading your file. If you start reading/writing different files on the same disk at same time, performance drops quickly, unless you have SSD disks!
Not sure why your readfile is failing for 0.5Mb buffers, just double checked and our live prod code is using 4Mb buffers
来源:https://stackoverflow.com/questions/16100024/optimization-of-sequential-i-o-operations-on-large-file-sizes