I have about 50 CSV files with 60,000 rows in each, and a varying number of columns. I want to merge all the CSV files by column. I\'ve tried doing this in MATLAB by transposing
Horizontal concatenation really is trivial. Considering you know C++, I'm surprised you used MATLAB. Processing a GB or so of data in the way you're doing should be in the order of seconds, not days.
By your description, no CSV processing is actually required. The easiest approach is to just do it in RAM.
vector< vector > data( num_files );
for( int i = 0; i < num_files; i++ ) {
ifstream input( filename[i] );
string line;
while( getline(input, line) ) data[i].push_back(line);
}
(Do obvious sanity checks, such as making sure all vectors are the same length...)
Now you have everything, dump it:
ofstream output("concatenated.csv");
for( int row = 0; row < num_rows; row++ ) {
for( int f = 1; f < num_files; f++ ) {
if( f == 0 ) output << ",";
output << data[f][row];
}
output << "\n";
}
If you don't want to use all that RAM, you can do it one line at a time. You should be able to keep all files open at once, and just store the ifstream
objects in a vector/array/list. In that case, you just read one line at a time from each file and write it to the output.