Using pytables, which is more efficient: scipy.sparse or numpy dense matrix?

混江龙づ霸主 提交于 2019-12-04 11:55:47

问题


When using pytables, there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g.

def store_sparse_matrix(self):
    grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M')
    self.getFileHandle().createArray(grp1, 'data', M.tocsr().data)
    self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr)
    self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices)

def get_sparse_matrix(self):
    return sparse.csr_matrix((self.getGroup().M.data, self.getGroup().M.indices, self.getGroup().M.indptr))

The trouble is that the get_sparse function takes some time (reading from disk), and if I understand it correctly also requires the data to fit into memory.

The only other option seems to convert the matrix to dense format (numpy array) and then use pytables normally. However this seems to be rather inefficient, although I suppose perhaps pytables will deal with the compression itself?


回答1:


Borrowing from Storing numpy sparse matrix in HDF5 (PyTables), you can marshal a scipy.sparse array into a pytables format using its data, indicies, and indptr attributes, which are three regular numpy.ndarray objects.



来源:https://stackoverflow.com/questions/8895120/using-pytables-which-is-more-efficient-scipy-sparse-or-numpy-dense-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!