Cosine similarity for very large dataset
问题 I am having trouble with calculating cosine similarity between large list of 100-dimensional vectors. When I use from sklearn.metrics.pairwise import cosine_similarity , I get MemoryError on my 16 GB machine. Each array fits perfectly in my memory but I get MemoryError during np.dot() internal call Here's my use-case and how I am currently tackling it. Here's my parent vector of 100-dimension which I need to compare with other 500,000 different vectors of same dimension (i.e. 100) parent