Concatenate two huge files in C++

后端 未结 4 1270
感动是毒
感动是毒 2021-02-06 01:55

I have two std::ofstream text files of a hundred plus megs each and I want to concatenate them. Using fstreams to store the data to create a single file usually ends up with an

相关标签:
4条回答
  • 2021-02-06 01:57

    Assuming you don't want to do any processing, and just want to concatenate two files to make a third, you can do this very simply by streaming the files' buffers:

    std::ifstream if_a("a.txt", std::ios_base::binary);
    std::ifstream if_b("b.txt", std::ios_base::binary);
    std::ofstream of_c("c.txt", std::ios_base::binary);
    
    of_c << if_a.rdbuf() << if_b.rdbuf();
    

    I have tried this sort of thing with files of up to 100Mb in the past and had no problems. You effectively let C++ and the libraries handle any buffering that's required. It also means that you don't need to worry about file positions if your files get really big.

    An alternative is if you just wanted to copy b.txt onto the end of a.txt, in which case you would need to open a.txt with the append flag, and seek to the end:

    std::ofstream of_a("a.txt", std::ios_base::binary | std::ios_base::app);
    std::ifstream if_b("b.txt", std::ios_base::binary);
    
    of_a.seekp(0, std::ios_base::end);
    of_a << if_b.rdbuf();
    

    How these methods work is by passing the std::streambuf of the input streams to the operator<< of the output stream, one of the overrides of which takes a streambuf parameter (operator<<). As mentioned in that link, in the case where there are no errors, the streambuf is inserted unformatted into the output stream until the end of file.

    0 讨论(0)
  • 2021-02-06 02:06

    On Windows:-

    system ("copy File1+File2 OutputFile");
    

    on Linux:-

    system ("cat File1 File2 > OutputFile");
    

    But the answer is simple - don't read the whole file into memory! Read the input files in small blocks:-

    void Cat (input_file, output_file)
    {
      while ((bytes_read = read_data (input_file, buffer, buffer_size)) != 0)
      { 
        write_data (output_file, buffer, bytes_read);
      }
    }
    
    int main ()
    {
       output_file = open output file
    
       input_file = open input file1
       Cat (input_file, output_file)
       close input_file
    
       input_file = open input file2
       Cat (input_file, output_file)
       close input_file
    }
    
    0 讨论(0)
  • 2021-02-06 02:08

    Is there any way of merging them faster than O(n)?

    That would mean you would process the data without passing through it even once. You cannot interpret it for merging without reading it at least once (short answer: no).

    For reading the data, you should consider un-buffered reads (look at std::fstream::read).

    0 讨论(0)
  • 2021-02-06 02:09

    It really depends whether you wish to use "pure" C++ for this, personally at the cost of portability I would be tempted to write:

    #include <cstdlib>
    #include <sstream>
    
    int main(int argc, char* argv[]) {
        std::ostringstream command;
    
        command << "cat "; // Linux Only, command for Windows is slightly different
    
        for (int i = 2; i < argc; ++i) { command << argv[i] << " "; }
    
        command << "> ";
    
        command << argv[1];
    
        return system(command.str().c_str());
    }
    

    Is it good C++ code ? No, not really (non-portable and does not escape command arguments).

    But it'll get you way ahead of where you are standing now.

    As for a "real" C++ solution, with all the ugliness that streams could manage...

    #include <fstream>
    #include <string>
    
    static size_t const BufferSize = 8192; // 8 KB
    
    void appendFile(std::string const& outFile, std::string const& inFile) {
        std::ofstream out(outFile, std::ios_base::app |
                                   std::ios_base::binary |
                                   std::ios_base::out);
    
        std::ifstream in(inFile, std::ios_base::binary |
                                 std::ios_base::in);
    
        std::vector<char> buffer(BufferSize);
        while (in.read(&buffer[0], buffer.size())) {
            out.write(&buffer[0], buffer.size());
        }
    
        // Fails when "read" encounters EOF,
        // but potentially still writes *some* bytes to buffer!
        out.write(&buffer[0], in.gcount());
    }
    
    0 讨论(0)
提交回复
热议问题