Why is std::sin() and std::cos() slower than sin() and cos()?

前端 未结 4 1807
小蘑菇
小蘑菇 2021-01-01 13:14

Test code:

#include 
#include 

const int N = 4096;
const float PI = 3.1415926535897932384626;

float cosine[N][N];
float sine[N][         


        
相关标签:
4条回答
  • 2021-01-01 13:32

    I guess the difference is that there are overloads for std::sin() for float and for double, while sin() only takes double. Inside std::sin() for floats, there may be a conversion to double, then a call to std::sin() for doubles, and then a conversion of the result back to float, making it slower.

    0 讨论(0)
  • 2021-01-01 13:40

    You're using a different overload:

    Try

            double angle = i*j*2*PI/N;
            cosine[i][j] = cos(angle);
            sine[i][j] = sin(angle);
    

    it should perform the same with or without using namespace std;

    0 讨论(0)
  • 2021-01-01 13:40

    Use -S flag in compiler command line and check the difference between assembler output. Maybe using namespace std; gives a lot of unused stuff in executable file.

    0 讨论(0)
  • 2021-01-01 13:47

    I did some measurements using clang with -O3 optimization, running on an Intel Core i7. I found that:

    • std::sin on float has the same cost as sinf
    • std::sin on double has the same cost as sin
    • The sin functions on double are 2.5x slower than on float (again, running on an Intel Core i7).

    Here is the full code to reproduce it:

    #include <chrono>
    #include <cmath>
    #include <iostream>
    
    template<typename Clock>
    struct Timer
    {
        using rep = typename Clock::rep;
        using time_point = typename Clock::time_point;
        using resolution = typename Clock::duration;
    
        Timer(rep& duration) :
        duration(&duration) {
            startTime = Clock::now();
        }
        ~Timer() {
            using namespace std::chrono;
            *duration = duration_cast<resolution>(Clock::now() - startTime).count();
        }
    private:
    
        time_point startTime;
        rep* duration;
    };
    
    template<typename T, typename F>
    void testSin(F sin_func) {
      using namespace std;
      using namespace std::chrono;
      high_resolution_clock::rep duration = 0;
      T sum {};
      {
        Timer<high_resolution_clock> t(duration);
        for(int i=0; i<100000000; ++i) {
          sum += sin_func(static_cast<T>(i));
        }
      }
      cout << duration << endl;
      cout << "  " << sum << endl;
    }
    
    int main() {
      testSin<float> ([] (float  v) { return std::sin(v); });
      testSin<float> ([] (float  v) { return sinf(v); });
      testSin<double>([] (double v) { return std::sin(v); });
      testSin<double>([] (double v) { return sin(v); });
      return 0;
    }
    

    I'd be interested if people could report, in the comments on the results on their architectures, especially regarding float vs. double time.

    0 讨论(0)
提交回复
热议问题