Building and updating a sparse matrix in python using scipy

后端 未结 3 554
北海茫月
北海茫月 2020-12-24 14:39

I\'m trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

What is the most efficient way of updating

相关标签:
3条回答
  • 2020-12-24 15:04
    import scipy.sparse
    
    rows = [2, 236, 246, 389, 1691]
    cols = [117, 3, 34, 2757, 74, 1635, 52]
    prod = [(x, y) for x in rows for y in cols] # combinations
    r = [x for (x, y) in prod] # x_coordinate
    c = [y for (x, y) in prod] # y_coordinate
    data = [1] * len(r)
    m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))
    

    I think it works well and doesn't need loops. I am directly following the doc

    <100000x40000 sparse matrix of type '<type 'numpy.int32'>'
        with 35 stored elements in COOrdinate format>
    
    0 讨论(0)
  • 2020-12-24 15:12

    This answer expands the comment of @behzad.nouri. To increment the values at the "outer product" of your lists of rows and columns indices, just create these as numpy arrays configured for broadcasting. In this case, that means put the rows into a column. For example,

    In [59]: a = lil_matrix((4,4), dtype=int)
    
    In [60]: a.A
    Out[60]: 
    array([[0, 0, 0, 0],
           [0, 0, 0, 0],
           [0, 0, 0, 0],
           [0, 0, 0, 0]])
    
    In [61]: rows = np.array([1,3]).reshape(-1, 1)
    
    In [62]: rows
    Out[62]: 
    array([[1],
           [3]])
    
    In [63]: cols = np.array([0, 2, 3])
    
    In [64]: a[rows, cols] += np.ones((rows.size, cols.size))
    
    In [65]: a.A
    Out[65]: 
    array([[0, 0, 0, 0],
           [1, 0, 1, 1],
           [0, 0, 0, 0],
           [1, 0, 1, 1]])
    
    In [66]: rows = np.array([0, 1]).reshape(-1,1)
    
    In [67]: cols = np.array([1, 2])
    
    In [68]: a[rows, cols] += np.ones((rows.size, cols.size))
    
    In [69]: a.A
    Out[69]: 
    array([[0, 1, 1, 0],
           [1, 1, 2, 1],
           [0, 0, 0, 0],
           [1, 0, 1, 1]])
    
    0 讨论(0)
  • 2020-12-24 15:15

    Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

    >>> import scipy.sparse as sps
    >>> shape = (1000, 2000)
    >>> rows, cols = 1000, 2000
    >>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
    >>> for j in xrange(100): # add 100 sets of 100 1's
    ...     r = np.random.randint(rows, size=100)
    ...     c = np.random.randint(cols, size=100)
    ...     d = np.ones((100,))
    ...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
    ... 
    >>> sps_acc
    <1000x2000 sparse matrix of type '<type 'numpy.float64'>'
        with 9985 stored elements in Compressed Sparse Row format>
    
    0 讨论(0)
提交回复
热议问题