Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix

前端 未结 3 547
借酒劲吻你
借酒劲吻你 2020-11-27 02:41

I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()s like this:

return DataFrame(matrix.toarray(),         


        
相关标签:
3条回答
  • 2020-11-27 03:08

    A much shorter version:

    df = pd.DataFrame(m.toarray())
    
    0 讨论(0)
  • 2020-11-27 03:09

    As of pandas v 0.20.0 you can use the SparseDataFrame constructor.

    An example from the pandas docs:

    import numpy as np
    import pandas as pd
    from scipy.sparse import csr_matrix
    
    arr = np.random.random(size=(1000, 5))
    arr[arr < .9] = 0
    sp_arr = csr_matrix(arr)
    sdf = pd.SparseDataFrame(sp_arr)
    
    0 讨论(0)
  • 2020-11-27 03:34

    A direct conversion is not supported ATM. Contributions are welcome!

    Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column) and pretty space efficient

    In [37]: col = np.array([0,0,1,2,2,2])
    
    In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')
    
    In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )
    
    In [40]: m
    Out[40]: 
    <3x3 sparse matrix of type '<type 'numpy.float64'>'
            with 6 stored elements in Compressed Sparse Column format>
    
    In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel()) 
                                  for i in np.arange(m.shape[0]) ])
    Out[46]: 
       0  1  2
    0  1  0  4
    1  0  0  5
    2  2  3  6
    
    In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel()) 
                                       for i in np.arange(m.shape[0]) ])
    
    In [48]: type(df)
    Out[48]: pandas.sparse.frame.SparseDataFrame
    
    0 讨论(0)
提交回复
热议问题