Why is my Python NumPy code faster than C++?

后端 未结 3 932
隐瞒了意图╮
隐瞒了意图╮ 2021-01-01 03:54

Why is this Python NumPy code,

import numpy as np
import time

k_max = 40000
N = 10000

data = np.zeros((2,N))
coefs = np.zeros((k_max,2),dtype=float)

t1 = t         


        
3条回答
  •  孤城傲影
    2021-01-01 04:44

    On my computer, your (current) Python code runs in 14.82 seconds (yes, my computer's quite slow).

    I rewrote your C++ code to something I'd consider halfway reasonable (basically, I almost ignored your C++ code and just rewrote your Python into C++. That gave me this:

    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    const unsigned int k_max = 40000;
    const unsigned int N = 10000;
    
    template 
    class matrix2 {
        std::vector data;
        size_t cols;
        size_t rows;
    public:
        matrix2(size_t y, size_t x) : cols(x), rows(y), data(x*y) {}
        T &operator()(size_t y, size_t x) {
            assert(x <= cols);
            assert(y <= rows);
            return data[y*cols + x];
        }
    
        T operator()(size_t y, size_t x) const {
            assert(x <= cols);
            assert(y <= rows);
            return data[y*cols + x];
        }
    };
    
    int main() {
        matrix2 data(N, 2);
        matrix2 coeffs(k_max, 2);
    
        using namespace std::chrono;
    
        auto start = high_resolution_clock::now();
    
        for (int k = 0; k < k_max; k++) {
            for (int j = 0; j < N - 1; j++) {
                coeffs(k, 0) += data(j, 1) * (cos((k + 1)*data(j, 0)) - cos((k + 1)*data(j+1, 0)));
                coeffs(k, 1) += data(j, 1) * (sin((k + 1)*data(j, 0)) - sin((k + 1)*data(j+1, 0)));
            }
        }
    
        auto end = high_resolution_clock::now();
        std::cout << duration_cast(end - start).count() << " ms\n";
    }
    

    This ran in about 14.4 seconds, so it's a slight improvement over the Python version--but given that the Python is mostly a pretty thin wrapper around some C code, getting only a slight improvement is pretty much what we should expect.

    The next obvious step would be to use multiple cores. To do that in C++, we can add this line:

    #pragma omp parallel for
    

    ...before the outer for loop:

    #pragma omp parallel for
    for (int k = 0; k < k_max; k++) {
        for (int j = 0; j < N - 1; j++) {
            coeffs(k, 0) += data(j, 1) * (cos((k + 1)*data(j, 0)) - cos((k + 1)*data(j+1, 0)));
            coeffs(k, 1) += data(j, 1) * (sin((k + 1)*data(j, 0)) - sin((k + 1)*data(j+1, 0)));
        }
    }
    

    With -openmp added to the compiler's command line, this ran in about 4.8 seconds. If you have more than 4 cores, you can probably expect a larger improvement than that though (conversely, if you have fewer than 4 cores, expect a smaller improvement--but nowadays, more than 4 is a lot more common that fewer).

提交回复
热议问题