sparse-matrix

How to make TF-IDF matrix dense?

被刻印的时光 ゝ 提交于 2020-08-17 04:58:22
问题 I am using TfidfVectorizer to convert a collection of raw documents to a matrix of TF-IDF features, which I then plan to input into a k-means algorithm (which I will implement). In that algorithm I will have to compute distances between centroids (categories of articles) and data points (articles). I am going to use Euclidean distance, so I need these two entities to be of same dimension, in my case max_features . Here is what I have: tfidf = TfidfVectorizer(max_features=10, strip_accents=

Read a file as SciPy sparse matrix directly

跟風遠走 提交于 2020-08-11 11:06:18
问题 Is it possible to read a space separated file, with each line containing float numbers directly as SciPy sparse matrix? 回答1: Given : A space separated file containing ~56 million rows and 25 space separated floating point numbers in each row with a lot of sparsity. Output : Convert the file into SciPy CSR sparse matrix as fast as possible May be there are better solutions out there, but this solution worked for me after a lot of suggestions from @CJR (some of which I couldn't take into

Read a file as SciPy sparse matrix directly

纵然是瞬间 提交于 2020-08-11 11:04:31
问题 Is it possible to read a space separated file, with each line containing float numbers directly as SciPy sparse matrix? 回答1: Given : A space separated file containing ~56 million rows and 25 space separated floating point numbers in each row with a lot of sparsity. Output : Convert the file into SciPy CSR sparse matrix as fast as possible May be there are better solutions out there, but this solution worked for me after a lot of suggestions from @CJR (some of which I couldn't take into

Sparse Matrix Vs Dense Matrix Multiplication C++ Tensorflow

泪湿孤枕 提交于 2020-08-10 19:11:32
问题 I would like to write in C++ Tensorflow sparse matrix dense vector (SPMv) multiplication: y = Ax The sparse matrix, A, is stored in CSR format. The usual sparsity of A is between 50-90%. The goal is to reach better or similar time than that of dense matrix dense vector (DMv) multiplication. Please note that I have already viewed the following posts: Q1 Q2 Q3. However, I still am wondering about the following: How does SPMv multiplication compare in terms of time to DMv? Since sparsity is

scipy sparse matrix sum results in a dense matrix - how to enforce result sparseness?

北城余情 提交于 2020-07-09 12:50:07
问题 Summing over one axis of a scipy.sparse.csr_matrix results in a numpy.matrix object. Given that my sparse matrix is really sparse, I find extremely annoying this behaviour. Here is an example: dense = [[ 0., 0., 0., 0., 0.], [ 1., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 2., 0., 4., 0., 0.]] from scipy.sparse import csr_matrix sparse = csr_matrix(dense) print(sparse.sum(1)) with result: matrix([[ 0.], [ 1.], [ 0.], [ 0.], [ 6.]]) How can I enforce sparseness in the sum

scipy csr_matrix: understand indptr

青春壹個敷衍的年華 提交于 2020-07-04 05:32:38
问题 Every once in a while, I get to manipulate a csr_matrix but I always forget how the parameters indices and indptr work together to build a sparse matrix. I am looking for a clear and intuitive explanation on how the indptr interacts with both the data and indices parameters when defining a sparse matrix using the notation csr_matrix((data, indices, indptr), [shape=(M, N)]) . I can see from the scipy documentation that the data parameter contains all the non-zero data, and the indices

Finding smallest eigenvectors of large sparse matrix, over 100x slower in SciPy than in Octave

左心房为你撑大大i 提交于 2020-06-13 19:11:21
问题 I am trying to compute few (5-500) eigenvectors corresponding to the smallest eigenvalues of large symmetric square sparse-matrices (up to 30000x30000) with less than 0.1% of the values being non-zero. I am currently using scipy.sparse.linalg.eigsh in shift-invert mode (sigma=0.0), which I figured out through various posts on the topic is the prefered solution. However, it takes up to 1h to solve the problem in most cases. On the other hand the function is very fast, if I ask for the largest