sparse 3d matrix/array in Python?

后端 未结 6 1895
借酒劲吻你
借酒劲吻你 2020-12-02 15:13

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d.

I am wondering if there is an existing data structure for sp

相关标签:
6条回答
  • 2020-12-02 15:38

    An alternative answer as of 2017 is the sparse package. According to the package itself it implements sparse multidimensional arrays on top of NumPy and scipy.sparse by generalizing the scipy.sparse.coo_matrix layout.

    Here's an example taken from the docs:

    import numpy as np
    n = 1000
    ndims = 4
    nnz = 1000000
    coords = np.random.randint(0, n - 1, size=(ndims, nnz))
    data = np.random.random(nnz)
    
    import sparse
    x = sparse.COO(coords, data, shape=((n,) * ndims))
    x
    # <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>
    
    x.nbytes
    # 16000000
    
    y = sparse.tensordot(x, x, axes=((3, 0), (1, 2)))
    
    y
    # <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>
    
    0 讨论(0)
  • 2020-12-02 15:39

    I needed a 3d look up table for x,y,z and came up with this solution..
    Why not use one of the dimensions to be a divisor of the third dimension? ie. use x and 'yz' as the matrix dimensions

    eg. if x has 80 potential members, y has 100 potential' and z has 20 potential' you make the sparse matrix to be 80 by 2000 (i.e. xy=100x20)
    x dimension is as usual
    yz dimension: the first 100 elements will represent z=0, y=0 to 99
    ..............the second 100 will represent z=2, y=0 to 99 etc
    so given element located at (x,y,z) would be in sparse matrix at (x, z*100 + y)
    if you need to use negative numbers design a aritrary offset into your matrix translation. the solutio could be expanded to n dimensions if necessary
    from scipy import sparse
    m = sparse.lil_matrix((100,2000), dtype=float)
    
    def add_element((x,y,z), element):
        element=float(element)
        m[x,y+z*100]=element
    
    def get_element(x,y,z):
        return m[x,y+z*100]
    
    add_element([3,2,4],2.2)
    add_element([20,15,7], 1.2)
    print get_element(0,0,0)
    print get_element(3,2,4)
    print get_element(20,15,7)
    print "  This is m sparse:";print m
    
    ====================
    OUTPUT:
    0.0
    2.2
    1.2
      This is m sparse:
      (3, 402L) 2.2
      (20, 715L)    1.2
    ====================
    
    0 讨论(0)
  • 2020-12-02 15:42

    Nicer than writing everything new from scratch may be to use scipy's sparse module as far as possible. This may lead to (much) better performance. I had a somewhat similar problem, but I only had to access the data efficiently, not perform any operations on them. Furthermore, my data were only sparse in two out of three dimensions.

    I have written a class that solves my problem and could (as far as I think) easily be extended to satisfiy the OP's needs. It may still hold some potential for improvement, though.

    import scipy.sparse as sp
    import numpy as np
    
    class Sparse3D():
        """
        Class to store and access 3 dimensional sparse matrices efficiently
        """
        def __init__(self, *sparseMatrices):
            """
            Constructor
            Takes a stack of sparse 2D matrices with the same dimensions
            """
            self.data = sp.vstack(sparseMatrices, "dok")
            self.shape = (len(sparseMatrices), *sparseMatrices[0].shape)
            self._dim1_jump = np.arange(0, self.shape[1]*self.shape[0], self.shape[1])
            self._dim1 = np.arange(self.shape[0])
            self._dim2 = np.arange(self.shape[1])
    
        def __getitem__(self, pos):
            if not type(pos) == tuple:
                if not hasattr(pos, "__iter__") and not type(pos) == slice: 
                    return self.data[self._dim1_jump[pos] + self._dim2]
                else:
                    return Sparse3D(*(self[self._dim1[i]] for i in self._dim1[pos]))
            elif len(pos) > 3:
                raise IndexError("too many indices for array")
            else:
                if (not hasattr(pos[0], "__iter__") and not type(pos[0]) == slice or
                    not hasattr(pos[1], "__iter__") and not type(pos[1]) == slice):
                    if len(pos) == 2:
                        result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]]]
                    else:
                        result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]], pos[2]].T
                        if hasattr(pos[2], "__iter__") or type(pos[2]) == slice:
                            result = result.T
                    return result
                else:
                    if len(pos) == 2:
                        return Sparse3D(*(self[i, self._dim2[pos[1]]] for i in self._dim1[pos[0]]))
                    else:
                        if not hasattr(pos[2], "__iter__") and not type(pos[2]) == slice:
                            return sp.vstack([self[self._dim1[pos[0]], i, pos[2]]
                                              for i in self._dim2[pos[1]]]).T
                        else:
                            return Sparse3D(*(self[i, self._dim2[pos[1]], pos[2]] 
                                              for i in self._dim1[pos[0]]))
    
        def toarray(self):
            return np.array([self[i].toarray() for i in range(self.shape[0])])
    
    0 讨论(0)
  • 2020-12-02 15:46

    Happy to suggest a (possibly obvious) implementation of this, which could be made in pure Python or C/Cython if you've got time and space for new dependencies, and need it to be faster.

    A sparse matrix in N dimensions can assume most elements are empty, so we use a dictionary keyed on tuples:

    class NDSparseMatrix:
      def __init__(self):
        self.elements = {}
    
      def addValue(self, tuple, value):
        self.elements[tuple] = value
    
      def readValue(self, tuple):
        try:
          value = self.elements[tuple]
        except KeyError:
          # could also be 0.0 if using floats...
          value = 0
        return value
    

    and you would use it like so:

    sparse = NDSparseMatrix()
    sparse.addValue((1,2,3), 15.7)
    should_be_zero = sparse.readValue((1,5,13))
    

    You could make this implementation more robust by verifying that the input is in fact a tuple, and that it contains only integers, but that will just slow things down so I wouldn't worry unless you're releasing your code to the world later.

    EDIT - a Cython implementation of the matrix multiplication problem, assuming other tensor is an N Dimensional NumPy array (numpy.ndarray) might look like this:

    #cython: boundscheck=False
    #cython: wraparound=False
    
    cimport numpy as np
    
    def sparse_mult(object sparse, np.ndarray[double, ndim=3] u):
      cdef unsigned int i, j, k
    
      out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double)
    
      for i in xrange(1,u.shape[0]-1):
        for j in xrange(1, u.shape[1]-1):
          for k in xrange(1, u.shape[2]-1):
            # note, here you must define your own rank-3 multiplication rule, which
            # is, in general, nontrivial, especially if LxMxN tensor...
    
            # loop over a dummy variable (or two) and perform some summation:
            out[i,j,k] = u[i,j,k] * sparse((i,j,k))
    
      return out
    

    Although you will always need to hand roll this for the problem at hand, because (as mentioned in code comment) you'll need to define which indices you're summing over, and be careful about the array lengths or things won't work!

    EDIT 2 - if the other matrix is also sparse, then you don't need to do the three way looping:

    def sparse_mult(sparse, other_sparse):
    
      out = NDSparseMatrix()
    
      for key, value in sparse.elements.items():
        i, j, k = key
        # note, here you must define your own rank-3 multiplication rule, which
        # is, in general, nontrivial, especially if LxMxN tensor...
    
        # loop over a dummy variable (or two) and perform some summation 
        # (example indices shown):
        out.addValue(key) = out.readValue(key) + 
          other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k))
    
      return out
    

    My suggestion for a C implementation would be to use a simple struct to hold the indices and the values:

    typedef struct {
      int index[3];
      float value;
    } entry_t;
    

    you'll then need some functions to allocate and maintain a dynamic array of such structs, and search them as fast as you need; but you should test the Python implementation in place for performance before worrying about that stuff.

    0 讨论(0)
  • 2020-12-02 15:55

    Have a look at sparray - sparse n-dimensional arrays in Python (by Jan Erik Solem). Also available on github.

    0 讨论(0)
  • 2020-12-02 15:55

    I also need 3D sparse matrix for solving the 2D heat equations (2 spatial dimensions are dense, but the time dimension is diagonal plus and minus one offdiagonal.) I found this link to guide me. The trick is to create an array Number that maps the 2D sparse matrix to a 1D linear vector. Then build the 2D matrix by building a list of data and indices. Later the Number matrix is used to arrange the answer back to a 2D array.

    [edit] It occurred to me after my initial post, this could be handled better by using the .reshape(-1) method. After research, the reshape method is better than flatten because it returns a new view into the original array, but flatten copies the array. The code uses the original Number array. I will try to update later.[end edit]

    I test it by creating a 1D random vector and solving for a second vector. Then multiply it by the sparse 2D matrix and I get the same result.

    Note: I repeat this many times in a loop with exactly the same matrix M, so you might think it would be more efficient to solve for inverse(M). But the inverse of M is not sparse, so I think (but have not tested) using spsolve is a better solution. "Best" probably depends on how large the matrix is you are using.

    #!/usr/bin/env python3
    # testSparse.py
    # profhuster
    
    import numpy as np
    import scipy.sparse as sM
    import scipy.sparse.linalg as spLA
    from array import array
    from numpy.random import rand, seed
    seed(101520)
    
    nX = 4
    nY = 3
    r = 0.1
    
    def loadSpNodes(nX, nY, r):
        # Matrix to map 2D array of nodes to 1D array
        Number = np.zeros((nY, nX), dtype=int)
    
        # Map each element of the 2D array to a 1D array
        iM = 0
        for i in range(nX):
            for j in range(nY):
                Number[j, i] = iM
                iM += 1
        print(f"Number = \n{Number}")
    
        # Now create a sparse matrix of the "stencil"
        diagVal = 1 + 4 * r
        offVal = -r
        d_list = array('f')
        i_list = array('i')
        j_list = array('i')
        # Loop over the 2D nodes matrix
        for i in range(nX):
            for j in range(nY):
                # Recall the 1D number
                iSparse = Number[j, i]
                # populate the diagonal
                d_list.append(diagVal)
                i_list.append(iSparse)
                j_list.append(iSparse)
                # Now, for each rectangular neighbor, add the 
                # off-diagonal entries
                # Use a try-except, so boundry nodes work
                for (jj,ii) in ((j+1,i),(j-1,i),(j,i+1),(j,i-1)):
                    try:
                        iNeigh = Number[jj, ii]
                        if jj >= 0 and ii >=0:
                            d_list.append(offVal)
                            i_list.append(iSparse)
                            j_list.append(iNeigh)
                    except IndexError:
                        pass
        spNodes = sM.coo_matrix((d_list, (i_list, j_list)), shape=(nX*nY,nX*nY))
        return spNodes
    
    
    MySpNodes = loadSpNodes(nX, nY, r)
    print(f"Sparse Nodes = \n{MySpNodes.toarray()}")
    b = rand(nX*nY)
    print(f"b=\n{b}")
    x = spLA.spsolve(MySpNodes.tocsr(), b)
    print(f"x=\n{x}")
    print(f"Multiply back together=\n{x * MySpNodes}")
    
    0 讨论(0)
提交回复
热议问题