cosine similarity on large sparse matrix with numpy
The code below causes my system to run out of memory before it completes. Can you suggest a more efficient means of computing the cosine similarity on a large matrix, such as the one below? I would like to have the cosine similarity computed for each of the 65000 rows in my original matrix ( mat ) relative to all of the others so that the result is a 65000 x 65000 matrix where each element is the cosine similarity between two rows in the original matrix. import numpy as np from scipy import sparse from sklearn.metrics.pairwise import cosine_similarity mat = np.random.rand(65000, 10) sparse_mat