Why are std::fstreams so slow?

前端 未结 5 844
不思量自难忘°
不思量自难忘° 2021-01-30 20:49

I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams

5条回答
  •  不思量自难忘°
    2021-01-30 21:22

    It would seem that, on Linux, for this large set of data, the implementation of fwrite is much more efficient, since it uses write rather than writev.

    I'm not sure WHY writev is so much slower than write, but that appears to be where the difference is. And I see absolutely no real reason as to why the fstream needs to use that construct in this case.

    This can easily be seen by using strace ./a.out (where a.out is the program testing this).

    Output:

    Fstream:

    clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0
    open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    writev(3, [{NULL, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824}], 2) = 1073741824
    close(3)                                = 0
    clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0
    write(1, "fstream write 13261.8 ms\n", 25fstream write 13261.8 ms) = 25
    

    FILE*:

    clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0
    open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 1073741824
    clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0
    write(1, "FILE* write 1653.87 ms\n", 23FILE* write 1653.87 ms) = 23
    

    I don't have them fancy SSD drives, so my machine will be a bit slower on that - or something else is slower in my case.

    As pointed out by Jan Hudec, I'm misinterpreting the results. I just wrote this:

    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    void measure(const std::string& test, std::function function)
    {
        auto start_time = std::chrono::high_resolution_clock::now();
    
        function();
    
        auto duration = std::chrono::duration_cast(std::chrono::high_resolution_clock::now() - start_time);
        std::cout<(duration.count()) * 0.000001<<" ms"<

    It is the actual fstream implementation that does something daft - probably copying the whole data in small chunks, somewhere and somehow, or something like that. I will try to find out further.

    And the result is pretty much identical for both cases, and faster than both fstream and FILE* variants in the question.

    Edit:

    It would seem like, on my machine, right now, if you add fclose(file) after the write, it takes approximately the same amount of time for both fstream and FILE* - on my system, around 13 seconds to write 1GB - with old style spinning disk type drives, not SSD.

    I can however write MUCH faster using this code:

    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    void measure(const std::string& test, std::function function)
    {
        auto start_time = std::chrono::high_resolution_clock::now();
    
        function();
    
        auto duration = std::chrono::duration_cast(std::chrono::high_resolution_clock::now() - start_time);
        std::cout<(duration.count()) * 0.000001<<" ms"<

    gives times of about 650-900 ms.

    I can also edit the original program to give a time of approximately 1000ms for fwrite - simply remove the fclose.

    I also added this method:

    measure("fstream write (new)", [buffer]()
    {
        std::ofstream* stream = new std::ofstream("test", std::ios::binary);
        stream->write(buffer, BUFFER_SIZE);
        // Intentionally no delete.
    });
    

    and then it takes about 1000 ms here too.

    So, my conclusion is that, somehow, sometimes, closing the file makes it flush to disk. In other cases, it doesn't. I still don't understand why...

提交回复
热议问题