Using cupy to create a distance matrix from another matrix on GPU

问题

I have written code using numpy that takes an array of size (m x n)... The rows (m) are individual observations comprised of (n) features... and creates a square distance matrix of size (m x m). This distance matrix is the distance of a given observation from all other observations. E.g. row 0 column 9 is the distance between observation 0 and observation 9.

import numpy as np
#import cupy as np

def l1_distance(arr):
    return np.linalg.norm(arr, 1)

X = np.random.randint(low=0, high=255, size=(700,4096))

distance = np.empty((700,700))

for i in range(700):
    for j in range(700):
        distance[i,j] = l1_distance(X[i,:] - X[j,:])

I attempted this on GPU using cupy by umcommenting the second import statement, but obviously the double for loop is drastically inefficient. It takes numpy approx 6 seconds, but cupy takes 26 seconds. I understand why but it's not immediately clear to me how to parallelize this process.

I know I'm going to need to write a reduction kernel of some sort, but I can't think of how to construct one cupy array from iterative operations on elements of another array.

回答1:

Using broadcasting CuPy takes 0.10 seconds in a A100 GPU compared to NumPy which takes 6.6 seconds

    for i in range(700):
        distance[i,:] = np.abs(np.broadcast_to(X[i,:], X.shape) - X).sum(axis=1)

This vectorizes and makes the distance of one vector to all other ones in parallel.

来源：https://stackoverflow.com/questions/64476655/using-cupy-to-create-a-distance-matrix-from-another-matrix-on-gpu

标签

python

gpu

cupy