How can I remove a column from a sparse matrix efficiently?

 ̄綄美尐妖づ 提交于 2019-11-30 13:13:00

问题


If I am using the sparse.lil_matrix format, how can I remove a column from the matrix easily and efficiently?


回答1:


I've been wanting this myself and in truth there isn't a great built-in way to do it yet. Here's a way to do it. I chose to make a subclass of lil_matrix and add the remove_col function. If you want, you can instead add the removecol function to the lil_matrix class in your lib/site-packages/scipy/sparse/lil.py file. Here's the code:

from scipy import sparse
from bisect import bisect_left

class lil2(sparse.lil_matrix):
    def removecol(self,j):
        if j < 0:
            j += self.shape[1]

        if j < 0 or j >= self.shape[1]:
            raise IndexError('column index out of bounds')

        rows = self.rows
        data = self.data
        for i in xrange(self.shape[0]):
            pos = bisect_left(rows[i], j)
            if pos == len(rows[i]):
                continue
            elif rows[i][pos] == j:
                rows[i].pop(pos)
                data[i].pop(pos)
                if pos == len(rows[i]):
                    continue
            for pos2 in xrange(pos,len(rows[i])):
                rows[i][pos2] -= 1

        self._shape = (self._shape[0],self._shape[1]-1)

I have tried it out and don't see any bugs. I certainly think that it is better than slicing the column out, which just creates a new matrix as far as I know.

I decided to make a removerow function as well, but I don't think that it is as good as removecol. I'm limited by not being able to remove one row from an ndarray in the way that I would like. Here is removerow which can be added to the above class

    def removerow(self,i):
        if i < 0:
            i += self.shape[0]

        if i < 0 or i >= self.shape[0]:
            raise IndexError('row index out of bounds')

        self.rows = numpy.delete(self.rows,i,0)
        self.data = numpy.delete(self.data,i,0)
        self._shape = (self._shape[0]-1,self.shape[1])

Perhaps I should submit these functions to the Scipy repository.




回答2:


Much simpler and faster. You might not even need the conversion to csr, but I just know for sure that it works with csr sparse matrices and converting between shouldn't be an issue.

from scipy import sparse

x_new = sparse.lil_matrix(sparse.csr_matrix(x)[:,col_list])



回答3:


For a sparse csr matrix (X) and a list of indices to drop (index_to_drop):

to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))    
new_X = X[:,to_keep]

It is easy to convert lil_matrices to csr_matrices. Check tocsr() in lil_matrix documentation

Note however that going from csr to lil matrices using tolil() is expensive. So, this choice is good when you do not require to have your matrix in lil format.




回答4:


I'm new to python so my answer is probably wrong, but I was wondering why something like the following won't be efficient?

Lets say your lil_matrix is called mat and that you want to remove the i-th column:

mat=hstack( [ mat[:,0:i] , mat[:,i+1:] ] )

Now the matrix will turn to a coo_matrix after that but you can turn it back to lil_matrix.

Ok, I understand that this will have to create the two matrices inside the hstack before it does the assignment to the mat variable so it would be like having the original matrix plus one more at the same time but I guess if the sparsity is big enough then I think there shouldn't be any memory problems (since memory (and time) is the whole reason of using sparse matrices).




回答5:



def removecols(W, col_list):
        if min(col_list) = W.shape[1]:
                raise IndexError('column index out of bounds')
        rows = W.rows
        data = W.data
        for i in xrange(M.shape[0]):
            for j in col_list:
                pos = bisect_left(rows[i], j)
                if pos == len(rows[i]):
                        continue
                elif rows[i][pos] == j:
                        rows[i].pop(pos)
                        data[i].pop(pos)
                        if pos == len(rows[i]):
                                continue
                for pos2 in xrange(pos,len(rows[i])):
                        rows[i][pos2] -= 1
        W._shape = (W._shape[0], W._shape[1]-len(col_list))
        return W

Just rewrote your code to work with col_list as input - maybe this will be helpful for somebody.




回答6:


By looking at the notes for each sparse matrix, specifically in our case is csc matrix it has the following advantages as listed in the documentation [1]

  • efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
  • efficient column slicing
  • fast matrix vector products (CSR, BSR may be faster)

If you have the column indices you want to remove, just use slicing. For removing rows use csr matrix since it is efficient in row slicing



来源:https://stackoverflow.com/questions/2368544/how-can-i-remove-a-column-from-a-sparse-matrix-efficiently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!