nvprof events “fb_subp0_read_sectors” and “fb_subp1_read_sectors” do not report correct results

杀马特。学长 韩版系。学妹 提交于 2019-12-24 08:49:07

问题


I tried to count the number of DRAM (global memory) accesses for simple vector add kernel.

__global__ void AddVectors(const float* A, const float* B, float* C, int N)
{
    int blockStartIndex  = blockIdx.x * blockDim.x * N;
    int threadStartIndex = blockStartIndex + threadIdx.x;
    int threadEndIndex   = threadStartIndex + ( N * blockDim.x );
    int i;

    for( i=threadStartIndex; i<threadEndIndex; i+=blockDim.x ){
        C[i] = A[i] + B[i];
    }
}

Grid Size = 180 Block size = 128

size of array = 180 * 128 * N floats where N is input parameter (elements per thread)

when N = 1, size of array = 180 * 128 * 1 floats = 90KB

All arrays A, B and C should be read from DRAM.

Therefore theoretically,

DRAM writes (C) = 2880 (32 byte accesses) DRAM reads (A,B) = 2880 + 2880 = 5760 (32 byte accesses)

But when I used nvprof

DRAM writes = fb_subp0_write_sectors + fb_subp1_write_sectors = 1440 + 1440 = 2880 (32 byte accesses) DRAM reads = fb_subp0_read_sectors + fb_subp1_read_sectors = 23 + 7 = 30 (32 byte accesses)

Now this is the problem. Theoretically there should be 5760 DRAM reads, but nvprof only reports 30, for me this looks impossible. Further more, if you double the size of the vector (N = 2), still the reported DRAM accesses remains at 30.

It would be great, if someone can shed some light.

I have disabled the L1 cache by using compiler option "-Xptxas -dlcm=cg"

Thanks, Waruna


回答1:


If you have done cudaMemcpy before the kernel launch to copy the source buffers from host to device, that gets the source buffers in L2 cache and hence the kernel doesn't see any misses from L2 for reads and you get less number of (fb_subp0_read_sectors + fb_subp1_read_sectors).

If you comment out cudaMemcpy before the kernel launch, you will see that the event values of fb_subp0_read_sectors and fb_subp1_read_sectors include the values you are expecting.



来源:https://stackoverflow.com/questions/21103755/nvprof-events-fb-subp0-read-sectors-and-fb-subp1-read-sectors-do-not-report

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!