Calculate squared Euclidean distance matrix on GPU

前端 未结 1 740
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-10 16:02

Let p be a matrix of first set of locations where each row gives the coordinates of a particular point. Similarly, let q be a matrix of second set of l

1条回答
  •  心在旅途
    2021-02-10 16:31

    The problem looks simple enough to make a library overkill.

    Without knowing the range of i and j, I'd suggest you partition k into blocks of a multiple of 32 threads each and in each block, compute

    float sum, myp[d];
    int i = blockIdx.x*blockDim.x + threadIdx.x;
    for ( int kk = 0 ; kk < d ; kk++ )
        myp[kk] = p(i,kk);
    for ( j = blockIdx.y*blockDim.y ; j < (blockIdx.y+1)*blockDim ; j++ ) {
        sum = 0.0f;
        #pragma unroll
        for ( int kk = 0 ; kk < d ; kk++ ) {
            temp = myp[kk] - q(j,kk);
            sum += temp*temp;
            }
        k(i,j) = sum;
        }
    

    where I am assuming that your data has d dimensions and writing p(i,k), q(j,k) and k(i,j) to mean an access to a two-dimensional array. I also took the liberty in assuming that your data is of type float.

    Note that depending on how k is stored, e.g. row-major or column-major, you may want to loop over i per thread instead to get coalesced writes to k.

    0 讨论(0)
提交回复
热议问题