Converting python sparse matrix dict to scipy sparse matrix

匿名 (未验证) 提交于 2019-12-03 08:46:08

问题:

I am using python scikit-learn for document clustering and I have a sparse matrix stored in a dict object:

For example:

doc_term_dict = { ('d1','t1'): 12,             \                   ('d2','t3'): 10,             \                   ('d3','t2'):  5              \                   }                            # from mysql data table  <type 'dict'> 

I want to use scikit-learn to do the clustering where the input matrix type is scipy.sparse.csr.csr_matrix

Example:

(0, 2164)   0.245793088885 (0, 2076)   0.205702177467 (0, 2037)   0.193810934784 (0, 2005)   0.14547028437 (0, 1953)   0.153720023365 ... <class 'scipy.sparse.csr.csr_matrix'> 

I can't find a way to convert dict to this csr-matrix (I have never used scipy.)

回答1:

Pretty straightforward. First read the dictionary and convert the keys to the appropriate row and column. Scipy supports (and recommends for this purpose) the COO-rdinate format for sparse matrices.

Pass it data, row, and column, where A[row[k], column[k] = data[k] (for all k) defines the matrix. Then let Scipy do the conversion to CSR.

Please check, that I have rows and columns in the way you want them, I might have them transposed. I also assumed that the input would be 1-indexed.

My code below prints:

(0, 0)        12 (1, 2)        10 (2, 1)        5 

Code:

#!/usr/bin/env python3 #http://stackoverflow.com/questions/26335059/converting-python-sparse-matrix-dict-to-scipy-sparse-matrix  from scipy.sparse import csr_matrix, coo_matrix  def convert(term_dict):     ''' Convert a dictionary with elements of form ('d1', 't1'): 12 to a CSR type matrix.     The element ('d1', 't1'): 12 becomes entry (0, 0) = 12.     * Conversion from 1-indexed to 0-indexed.     * d is row     * t is column.     '''     # Create the appropriate format for the COO format.     data = []     row = []     col = []     for k, v in term_dict.items():         r = int(k[0][1:])         c = int(k[1][1:])         data.append(v)         row.append(r-1)         col.append(c-1)     # Create the COO-matrix     coo = coo_matrix((data,(row,col)))     # Let Scipy convert COO to CSR format and return     return csr_matrix(coo)  if __name__=='__main__':     doc_term_dict = { ('d1','t1'): 12,             \                 ('d2','t3'): 10,             \                 ('d3','t2'):  5              \                 }        print(convert(doc_term_dict)) 


回答2:

We can make @Unapiedra's (excellent) answer a little more sparse:

from scipy.sparse import csr_matrix def _dict_to_csr(term_dict):     term_dict_v = list(term_dict.itervalues())     term_dict_k = list(term_dict.iterkeys())     shape = list(repeat(np.asarray(term_dict_k).max() + 1,2))     csr = csr_matrix((term_dict_v, zip(*term_dict_k)), shape = shape)     return csr 


回答3:

Same as @carsonc, but for Python 3.X :

from scipy.sparse import csr_matrix def _dict_to_csr(term_dict):     term_dict_v = term_dict.values()     term_dict_k = term_dict.keys()     term_dict_k_zip = zip(*term_dict_k)     term_dict_k_zip_list = list(term_dict_k_zip)      shape = (len(term_dict_k_zip_list[0]), len(term_dict_k_zip_list[1]))     csr = csr_matrix((list(term_dict_v), list(map(list, zip(*term_dict_k)))), shape = shape)     return csr 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!