When using pytables
, there's no support (as far as I can tell) for the scipy.sparse
matrix formats, so to store a matrix I have to do some conversion, e.g.
def store_sparse_matrix(self):
grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M')
self.getFileHandle().createArray(grp1, 'data', M.tocsr().data)
self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr)
self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices)
def get_sparse_matrix(self):
return sparse.csr_matrix((self.getGroup().M.data, self.getGroup().M.indices, self.getGroup().M.indptr))
The trouble is that the get_sparse
function takes some time (reading from disk), and if I understand it correctly also requires the data to fit into memory.
The only other option seems to convert the matrix to dense format (numpy array
) and then use pytables
normally. However this seems to be rather inefficient, although I suppose perhaps pytables
will deal with the compression itself?
Borrowing from Storing numpy sparse matrix in HDF5 (PyTables), you can marshal a scipy.sparse
array into a pytables format using its data
, indicies
, and indptr
attributes, which are three regular numpy.ndarray
objects.
来源:https://stackoverflow.com/questions/8895120/using-pytables-which-is-more-efficient-scipy-sparse-or-numpy-dense-matrix