What's the fastest way to merge multiple csv files by column?

后端 未结 5 819
长情又很酷
长情又很酷 2021-02-09 04:08

I have about 50 CSV files with 60,000 rows in each, and a varying number of columns. I want to merge all the CSV files by column. I\'ve tried doing this in MATLAB by transposing

5条回答
  •  忘掉有多难
    2021-02-09 04:42

    Horizontal concatenation really is trivial. Considering you know C++, I'm surprised you used MATLAB. Processing a GB or so of data in the way you're doing should be in the order of seconds, not days.

    By your description, no CSV processing is actually required. The easiest approach is to just do it in RAM.

    vector< vector > data( num_files );
    
    for( int i = 0; i < num_files; i++ ) {
        ifstream input( filename[i] );
        string line;
        while( getline(input, line) ) data[i].push_back(line);
    }
    

    (Do obvious sanity checks, such as making sure all vectors are the same length...)

    Now you have everything, dump it:

    ofstream output("concatenated.csv");
    
    for( int row = 0; row < num_rows; row++ ) {
        for( int f = 1; f < num_files; f++ ) {
            if( f == 0 ) output << ",";
            output << data[f][row];
        }
        output << "\n";
    }
    

    If you don't want to use all that RAM, you can do it one line at a time. You should be able to keep all files open at once, and just store the ifstream objects in a vector/array/list. In that case, you just read one line at a time from each file and write it to the output.

提交回复
热议问题