My opencl test does not run much faster than CPU

后端 未结 2 1426
南笙
南笙 2021-01-13 13:10

I am trying to measure the execution time of GPU and compare it with CPU. I wrote a simple_add function to add all elements of a short int vector. The Kernel code is:

<
2条回答
  •  离开以前
    2021-01-13 13:33

    I did some extra tests and realized that the GPU is optimized for floating point operations. I changed the the test code as below:

    void kernel simple_add(global const int * A, global const uint * B, global int* C)
        {
            ///------------------------------------------------
            /// Add 16 bits of each
            int AA=A[get_global_id(0)];
            int BB=B[get_global_id(0)];
            float AH=0xFFFF0000 & AA;
            float AL=0x0000FFFF & AA;
            float BH=0xFFFF0000 & BB;
            float BL=0x0000FFFF & BB;
            int CL=(int)(AL*cos(AL)+BL*sin(BL))&0x0000FFFF;
            int CH=(int)(AH*cos(AH)+BH*sin(BL))&0xFFFF0000;
               C[get_global_id(0)]=CH|CL;               
         }
    

    and got the result that I expected (about 10 time faster):

                    CPU time:      741046.665  micro-sec
                    GPU time:       54618.889  micro-sec
                    ----------------------------------------------------
                    CPU time:      741788.112  micro-sec
                    GPU time:       54875.666  micro-sec
                    ----------------------------------------------------
                    CPU time:      739975.979  micro-sec
                    GPU time:       54560.445  micro-sec
                    ----------------------------------------------------
                    CPU time:      755848.937  micro-sec
                    GPU time:       54582.111  micro-sec
                    ----------------------------------------------------
                    CPU time:      724100.716  micro-sec
                    GPU time:       56893.445  micro-sec
                    ----------------------------------------------------
                    CPU time:      744476.351  micro-sec
                    GPU time:       54596.778  micro-sec
                    ----------------------------------------------------
                    CPU time:      727787.538  micro-sec
                    GPU time:       54602.445  micro-sec
                    ----------------------------------------------------
                    CPU time:      731132.939  micro-sec
                    GPU time:       54710.000  micro-sec
                    ----------------------------------------------------
                    CPU time:      727899.150  micro-sec
                    GPU time:       54583.444  micro-sec
                    ----------------------------------------------------
                    CPU time:      727089.880  micro-sec
                    GPU time:       54594.778  micro-sec
                    ----------------------------------------------------
    

    for a bit heavier floating point operations like below:

            void kernel simple_add(global const int * A, global const uint * B, global int* C)
                {
                    ///------------------------------------------------
                    /// Add 16 bits of each
                    int AA=A[get_global_id(0)];
                    int BB=B[get_global_id(0)];
                    float AH=0xFFFF0000 & AA;
                    float AL=0x0000FFFF & AA;
                    float BH=0xFFFF0000 & BB;
                    float BL=0x0000FFFF & BB;
                    int CL=(int)(AL*(cos(AL)+sin(2*AL)+cos(3*AL)+sin(4*AL)+cos(5*AL)+sin(6*AL))+
                            BL*(cos(BL)+sin(2*BL)+cos(3*BL)+sin(4*BL)+cos(5*BL)+sin(6*BL)))&0x0000FFFF;
                    int CH=(int)(AH*(cos(AH)+sin(2*AH)+cos(3*AH)+sin(4*AH)+cos(5*AH)+sin(6*AH))+
                            BH*(cos(BH)+sin(2*BH)+cos(3*BH)+sin(4*BH)+cos(5*BH)+sin(6*BH)))&0xFFFF0000;
                            C[get_global_id(0)]=CH|CL;
    
                 }
    

    The result was more or less the same:

                    CPU time:     3905725.933  micro-sec
                    GPU time:      354543.111  micro-sec
                    -----------------------------------------
                    CPU time:     3698211.308  micro-sec
                    GPU time:      354850.333  micro-sec
                    -----------------------------------------
                    CPU time:     3696179.243  micro-sec
                    GPU time:      354302.667  micro-sec
                    -----------------------------------------
                    CPU time:     3692988.914  micro-sec
                    GPU time:      354764.111  micro-sec
                    -----------------------------------------
                    CPU time:     3699645.146  micro-sec
                    GPU time:      354287.666  micro-sec
                    -----------------------------------------
                    CPU time:     3681591.964  micro-sec
                    GPU time:      357071.889  micro-sec
                    -----------------------------------------
                    CPU time:     3744179.707  micro-sec
                    GPU time:      354249.444  micro-sec
                    -----------------------------------------
                    CPU time:     3704143.214  micro-sec
                    GPU time:      354934.111  micro-sec
                    -----------------------------------------
                    CPU time:     3667518.628  micro-sec
                    GPU time:      354809.222  micro-sec
                    -----------------------------------------
                    CPU time:     3714312.759  micro-sec
                    GPU time:      354883.888  micro-sec
                    -----------------------------------------
    

提交回复
热议问题