Faster way to create tab deliminated text files?

前端 未结 6 1002
忘掉有多难
忘掉有多难 2021-02-09 18:01

Many of my programs output huge volumes of data for me to review on Excel. The best way to view all these files is to use a tab deliminated text format. Currently i use this chu

6条回答
  •  Happy的楠姐
    2021-02-09 18:26

    I decided to test JPvdMerwe's claim that C stdio is faster than C++ IO streams. (Spoiler: yes, but not necessarily by much.) To do this, I used the following test programs:

    Common wrapper code, omitted from programs below:

    #include 
    #include 
    int main (void) {
      // program code goes here
    }
    

    Program 1: normal synchronized C++ IO streams

    for (int j = 0; j < ROWS; j++) {
      for (int i = 0; i < COLS; i++) {
        std::cout << (i-j) << "\t";
      }
      std::cout << "\n";
    }
    

    Program 2: unsynchronized C++ IO streams

    Same as program 1, except with std::cout.sync_with_stdio(false); prepended.

    Program 3: C stdio printf()

    for (int j = 0; j < ROWS; j++) {
      for (int i = 0; i < COLS; i++) {
        printf("%d\t", i-j);
      }
      printf("\n");
    }
    

    All programs were compiled with GCC 4.8.4 on Ubuntu Linux, using the following command:

    g++ -Wall -ansi -pedantic -DROWS=10000 -DCOLS=1000 prog.cpp -o prog
    

    and timed using the command:

    time ./prog > /dev/null
    

    Here are the results of the test on my laptop (measured in wall clock time):

    • Program 1 (synchronized C++ IO): 3.350s (= 100%)
    • Program 2 (unsynchronized C++ IO): 3.072s (= 92%)
    • Program 3 (C stdio): 2.592s (= 77%)

    I also ran the same test with g++ -O2 to test the effect of optimization, and got the following results:

    • Program 1 (synchronized C++ IO) with -O2: 3.118s (= 100%)
    • Program 2 (unsynchronized C++ IO) with -O2: 2.943s (= 94%)
    • Program 3 (C stdio) with -O2: 2.734s (= 88%)

    (The last line is not a fluke; program 3 consistently runs slower for me with -O2 than without it!)

    Thus, my conclusion is that, based on this test, C stdio is indeed about 10% to 25% faster for this task than (synchronized) C++ IO. Using unsynchronized C++ IO saves about 5% to 10% over synchronized IO, but is still slower than stdio.


    Ps. I tried a few other variations, too:

    • Using std::endl instead of "\n" is, as expected, slightly slower, but the difference is less than 5% for the parameter values given above. However, printing more but shorter output lines (e.g. -DROWS=1000000 -DCOLS=10) makes std::endl more than 30% slower than "\n".

    • Piping the output to a normal file instead of /dev/null slows down all the programs by about 0.2s, but makes no qualitative difference to the results.

    • Increasing the line count by a factor of 10 also yields no surprises; the programs all take about 10 times longer to run, as expected.

    • Prepending std::cout.sync_with_stdio(false); to program 3 has no noticeable effect.

    • Using (double)(i-j) (and "%g\t" for printf()) slows down all three programs a lot! Notably, program 3 is still fastest, taking only 9.3s where programs 1 and 2 each took a bit over 14s, a speedup of nearly 40%! (And yes, I checked, the outputs are identical.) Using -O2 makes no significant difference either.

提交回复
热议问题