问题
I have two std::ofstream text files of a hundred plus megs each and I want to concatenate them. Using fstreams to store the data to create a single file usually ends up with an out of memory error because the size is too big.
Is there any way of merging them faster than O(n)?
File 1 (160MB):
0 1 3 5
7 9 11 13
...
...
9187653 9187655 9187657 9187659
File 2 (120MB):
a b c d e f g h i j
a b c d e f g h j i
a b c d e f g i h j
a b c d e f g i j h
...
...
j i h g f e d c b a
Merged (380MB):
0 1 3 5
7 9 11 13
...
...
9187653 9187655 9187657 9187659
a b c d e f g h i j
a b c d e f g h j i
a b c d e f g i h j
a b c d e f g i j h
...
...
j i h g f e d c b a
File generation:
std::ofstream a_file ( "file1.txt" );
std::ofstream b_file ( "file2.txt" );
while(//whatever){
a_file << num << endl;
}
while(//whatever){
b_file << character << endl;
}
// merge them here, doesn't matter if output is one of them or a new file
a_file.close();
b_file.close();
回答1:
Assuming you don't want to do any processing, and just want to concatenate two files to make a third, you can do this very simply by streaming the files' buffers:
std::ifstream if_a("a.txt", std::ios_base::binary);
std::ifstream if_b("b.txt", std::ios_base::binary);
std::ofstream of_c("c.txt", std::ios_base::binary);
of_c << if_a.rdbuf() << if_b.rdbuf();
I have tried this sort of thing with files of up to 100Mb in the past and had no problems. You effectively let C++ and the libraries handle any buffering that's required. It also means that you don't need to worry about file positions if your files get really big.
An alternative is if you just wanted to copy b.txt
onto the end of a.txt
, in which case you would need to open a.txt
with the append flag, and seek to the end:
std::ofstream of_a("a.txt", std::ios_base::binary | std::ios_base::app);
std::ifstream if_b("b.txt", std::ios_base::binary);
of_a.seekp(0, std::ios_base::end);
of_a << if_b.rdbuf();
How these methods work is by passing the std::streambuf
of the input streams to the operator<<
of the output stream, one of the overrides of which takes a streambuf
parameter (operator<<). As mentioned in that link, in the case where there are no errors, the streambuf
is inserted unformatted into the output stream until the end of file.
回答2:
Is there any way of merging them faster than O(n)?
That would mean you would process the data without passing through it even once. You cannot interpret it for merging without reading it at least once (short answer: no).
For reading the data, you should consider un-buffered reads (look at std::fstream::read).
回答3:
On Windows:-
system ("copy File1+File2 OutputFile");
on Linux:-
system ("cat File1 File2 > OutputFile");
But the answer is simple - don't read the whole file into memory! Read the input files in small blocks:-
void Cat (input_file, output_file)
{
while ((bytes_read = read_data (input_file, buffer, buffer_size)) != 0)
{
write_data (output_file, buffer, bytes_read);
}
}
int main ()
{
output_file = open output file
input_file = open input file1
Cat (input_file, output_file)
close input_file
input_file = open input file2
Cat (input_file, output_file)
close input_file
}
回答4:
It really depends whether you wish to use "pure" C++ for this, personally at the cost of portability I would be tempted to write:
#include <cstdlib>
#include <sstream>
int main(int argc, char* argv[]) {
std::ostringstream command;
command << "cat "; // Linux Only, command for Windows is slightly different
for (int i = 2; i < argc; ++i) { command << argv[i] << " "; }
command << "> ";
command << argv[1];
return system(command.str().c_str());
}
Is it good C++ code ? No, not really (non-portable and does not escape command arguments).
But it'll get you way ahead of where you are standing now.
As for a "real" C++ solution, with all the ugliness that streams could manage...
#include <fstream>
#include <string>
static size_t const BufferSize = 8192; // 8 KB
void appendFile(std::string const& outFile, std::string const& inFile) {
std::ofstream out(outFile, std::ios_base::app |
std::ios_base::binary |
std::ios_base::out);
std::ifstream in(inFile, std::ios_base::binary |
std::ios_base::in);
std::vector<char> buffer(BufferSize);
while (in.read(&buffer[0], buffer.size())) {
out.write(&buffer[0], buffer.size());
}
// Fails when "read" encounters EOF,
// but potentially still writes *some* bytes to buffer!
out.write(&buffer[0], in.gcount());
}
来源:https://stackoverflow.com/questions/19564450/concatenate-two-huge-files-in-c