I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams
TL;DR: Try adding this to your code before doing the writing:
const size_t bufsize = 256*1024;
char buf[bufsize];
mystream.rdbuf()->pubsetbuf(buf, bufsize);
When working with large files with fstream
, make sure to use a stream buffer.
Counterintuitively, disabling stream buffering dramatically reduces performance. At least the MSVC implementation copies 1 char at a time to the filebuf
when no buffer was set (see streambuf::xsputn()
), which can make your application CPU-bound, which will result in lower I/O rates.
NB: You can find a complete sample application here.
In contrary to other answers, a big issue with large file reads comes from buffering by the C standard library. Try using low level read
/write
calls in large chunks (1024KB) and see the performance jump.
File buffering by the C library is useful for reading or writing small chunks of data (smaller than disk block size).
On Windows I got almost a 3x performance boost dropping file buffering when reading and writing raw video streams.
I also opened the file using native OS (win32) API calls and told the OS not to cache the file as this involves yet another copy.
It would seem that, on Linux, for this large set of data, the implementation of fwrite
is much more efficient, since it uses write
rather than writev
.
I'm not sure WHY writev
is so much slower than write
, but that appears to be where the difference is. And I see absolutely no real reason as to why the fstream
needs to use that construct in this case.
This can easily be seen by using strace ./a.out
(where a.out
is the program testing this).
Output:
Fstream:
clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
writev(3, [{NULL, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824}], 2) = 1073741824
close(3) = 0
clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0
write(1, "fstream write 13261.8 ms\n", 25fstream write 13261.8 ms) = 25
FILE*:
clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 1073741824
clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0
write(1, "FILE* write 1653.87 ms\n", 23FILE* write 1653.87 ms) = 23
I don't have them fancy SSD drives, so my machine will be a bit slower on that - or something else is slower in my case.
As pointed out by Jan Hudec, I'm misinterpreting the results. I just wrote this:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main()
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 0, BUFFER_SIZE);
measure("writev", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY);
struct iovec vec[] =
{
{ NULL, 0 },
{ (void *)buffer, BUFFER_SIZE }
};
writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
close(fd);
});
measure("write", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY);
write(fd, buffer, BUFFER_SIZE);
close(fd);
});
}
It is the actual fstream
implementation that does something daft - probably copying the whole data in small chunks, somewhere and somehow, or something like that. I will try to find out further.
And the result is pretty much identical for both cases, and faster than both fstream
and FILE*
variants in the question.
Edit:
It would seem like, on my machine, right now, if you add fclose(file)
after the write, it takes approximately the same amount of time for both fstream
and FILE*
- on my system, around 13 seconds to write 1GB - with old style spinning disk type drives, not SSD.
I can however write MUCH faster using this code:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main()
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 0, BUFFER_SIZE);
measure("writev", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY, 0660);
struct iovec vec[] =
{
{ NULL, 0 },
{ (void *)buffer, BUFFER_SIZE }
};
writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
close(fd);
});
measure("write", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY, 0660);
write(fd, buffer, BUFFER_SIZE);
close(fd);
});
}
gives times of about 650-900 ms.
I can also edit the original program to give a time of approximately 1000ms for fwrite
- simply remove the fclose
.
I also added this method:
measure("fstream write (new)", [buffer]()
{
std::ofstream* stream = new std::ofstream("test", std::ios::binary);
stream->write(buffer, BUFFER_SIZE);
// Intentionally no delete.
});
and then it takes about 1000 ms here too.
So, my conclusion is that, somehow, sometimes, closing the file makes it flush to disk. In other cases, it doesn't. I still don't understand why...
The stream is somehow broken on the MAC, old implementation or setup.
An old setup could cause the FILE to be written in the exe directory and the stream in the user directory, this shouldn't make any difference unless you got 2 disks or other different setting.
On my lousy Vista I get
Normal buffer+Uncached:
C++ 201103
FILE* write 4756 ms
FILE* read 5007 ms
fstream write 5526 ms
fstream read 5728 ms
Normal buffer+Cached:
C++ 201103
FILE* write 4747 ms
FILE* read 454 ms
fstream write 5490 ms
fstream read 396 ms
Large Buffer+cached:
C++ 201103
5th run:
FILE* write 4760 ms
FILE* read 446 ms
fstream write 5278 ms
fstream read 369 ms
This shows that the FILE write is faster than the fstream, but slower in read than fstream ... but all numbers are within ~10% of each other.
Try adding some more buffering to your stream to see if that helps.
const int MySize = 1024*1024;
char MrBuf[MySize];
stream.rdbuf()->pubsetbuf(MrBuf, MySize);
The equivalent for FILE is
const int MySize = 1024*1024;
if (!setvbuf ( file , NULL , _IOFBF , MySize ))
DieInDisgrace();
A side note for whom interests. The main keywords are Windows 2016 server /CloseHandle.
In our app we discovered a NASTY bug on win2016 server.
Our std code under EVERY windows version takes: (ms)
time CreateFile/SetFilePointer 1 WriteFile 0 CloseHandle 0
on windows 2016 we got:
time CreateFile/SetFilePointer 1 WriteFile 0 CloseHandle 275
And times grows with dimension of file, that is ABSURD.
After a LOT of investigations (we first found "CloseHandle" is the culprit...) we discovered that under windows2016 MS attached an "hook" in close function that triggers "Windows Defender" to scan ALL the file and prevents returning until done. (in other words scanning is synchronous, that is PURE MADNESS).
When we added exclusion in "Defender" for our file, all works fine. I think is a BAD design, no antivirus stops normal file active INSIDE program space to scan files. (MS can do it as they have the power to do so.)