Calculate squared Euclidean distance matrix on GPU

前端未结

关注

 1  740

佛祖请我去吃肉 2021-02-10 16:02

Let p be a matrix of first set of locations where each row gives the coordinates of a particular point. Similarly, let q be a matrix of second set of l

1条回答

心在旅途 (楼主)

2021-02-10 16:31
The problem looks simple enough to make a library overkill.

Without knowing the range of i and j, I'd suggest you partition k into blocks of a multiple of 32 threads each and in each block, compute
```
float sum, myp[d];
int i = blockIdx.x*blockDim.x + threadIdx.x;
for ( int kk = 0 ; kk < d ; kk++ )
    myp[kk] = p(i,kk);
for ( j = blockIdx.y*blockDim.y ; j < (blockIdx.y+1)*blockDim ; j++ ) {
    sum = 0.0f;
    #pragma unroll
    for ( int kk = 0 ; kk < d ; kk++ ) {
        temp = myp[kk] - q(j,kk);
        sum += temp*temp;
        }
    k(i,j) = sum;
    }
```
where I am assuming that your data has d dimensions and writing p(i,k), q(j,k) and k(i,j) to mean an access to a two-dimensional array. I also took the liberty in assuming that your data is of type float.

Note that depending on how k is stored, e.g. row-major or column-major, you may want to loop over i per thread instead to get coalesced writes to k.
0 讨论(0)
发布评论:

提交评论
- 加载中...