Argmax of each row or column in scipy sparse matrix

匿名 (未验证) 提交于 2019-12-03 01:49:02

问题:

scipy.sparse.coo_matrix.max returns the maximum value of each row or column, given an axis. I would like to know not the value, but the index of the maximum value of each row or column. I haven't found a way to make this in an efficient manner yet, so I'll gladly accept any help.

回答1:

From scipy version 0.19, both csr_matrix and csc_matrix support argmax() and argmin() methods.



回答2:

I would suggest studying the code for

moo._min_or_max_axis

where moo is a coo_matrix.

mat = mat.tocsc()  # for axis=0 mat.sum_duplicates()  major_index, value = mat._minor_reduce(min_or_max) not_full = np.diff(mat.indptr)[major_index] 

Depending on the axis it prefers to work with csc over csr. I haven't had time analyze this, but I'm guessing it should be possible to include argmax in the calculation.


This suggestion may not work. The key is the mat._minor_reduce method, which does, with some refinement:

ufunc.reduceat(mat.data, mat.indptr[:-1])

That is is applies the ufunc to blocks of the matrix data array, using the indptr to define the blocks. np.sum, np.maxiumum are ufunc where this works. I don't know of an equivalent argmax ufunc.

In general if you want to do things by 'row' for a csr matrix (or col of csc), you either have to iterate over the rows, which is relatively expensive, or use this ufunc.reduceat to do the same thing over the flat mat.data vector.

group argmax/argmin over partitioning indices in numpy tries to perform a argmax.reduceat. The solution there might be adaptable to a sparse matrix.



回答3:

If A is your scipy.sparse.coo_matrix, then you get the row and column of the maximum value as follows:

I=A.data.argmax() maxrow = A.row[I] maxcol=A.col[I]

To get the index of maximum value on each row see the EDIT below:

from scipy.sparse import coo_matrix import numpy as np row  = np.array([0, 3, 1, 0]) col  = np.array([0, 2, 3, 2]) data = np.array([-3, 4, 11, -7]) A= coo_matrix((data, (row, col)), shape=(4, 4)) print A.toarray()  nrRows=A.shape[0] maxrowind=[] for i in range(nrRows):     r = A.getrow(i)# r is 1xA.shape[1] matrix     maxrowind.append( r.indices[r.data.argmax()] if r.nnz else 0) print maxrowind 

r.nnz is the the count of explicitly-stored values (i.e. nonzero values)



回答4:

The latest release of the numpy_indexed package (disclaimer: I am its author) can solve this problem in an efficient and elegant manner:

import numpy_indexed as npi col, argmax = group_by(coo.col).argmax(coo.data) row = coo.row[argmax]

Here we group by col, so its the argmax over the columns; swapping row and col will give you the argmax over the rows.



回答5:

Expanding on the answers from @hpaulj and @joeln and using code from group argmax/argmin over partitioning indices in numpy as suggested, this function will calculate argmax over columns for CSR or argmax over rows for CSC:

import numpy as np import scipy.sparse as sp  def csr_csc_argmax(X, axis=None):     is_csr = isinstance(X, sp.csr_matrix)     is_csc = isinstance(X, sp.csc_matrix)     assert( is_csr or is_csc )     assert( not axis or (is_csr and axis==1) or (is_csc and axis==0) )      major_size = X.shape[0 if is_csr else 1]     major_lengths = np.diff(X.indptr) # group_lengths     major_not_empty = (major_lengths > 0)      result = -np.ones(shape=(major_size,), dtype=X.indices.dtype)     split_at = X.indptr[:-1][major_not_empty]     maxima = np.zeros((major_size,), dtype=X.dtype)     maxima[major_not_empty] = np.maximum.reduceat(X.data, split_at)     all_argmax = np.flatnonzero(np.repeat(maxima, major_lengths) == X.data)     result[major_not_empty] = X.indices[all_argmax[np.searchsorted(all_argmax, split_at)]]     return result

It returns -1 for the argmax of any rows (CSR) or columns (CSC) that are completely sparse (i.e., that are completely zero after X.eliminate_zeros()).



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!