I'm trying to write a spectral clustering algorithm using NumPy/SciPy for larger (but still tractable) systems, making use of SciPy's sparse linear algebra library. Unfortunately, I'm running into stability issues with eigsh().
Here's my code:
import numpy as np import scipy.sparse import scipy.sparse.linalg as SLA import sklearn.utils.graph as graph W = self._sparse_rbf_kernel(self.X_, self.datashape) D = scipy.sparse.csc_matrix(np.diag(np.array(W.sum(axis = 0))[0])) L = graph.graph_laplacian(W) # D - W vals, vects = SLA.eigsh(L, k = self.k, M = D, which = 'SM', sigma = 0, maxiter = 1000)
The sklearn
library refers to the scikit-learn package, specifically this method for calculating a graph laplacian from a sparse SciPy matrix.
_sparse_rbf_kernel
is a method I wrote to compute pairwise affinities of the data points. It operates by creating a sparse affinity matrix from image data, specifically by only computing pairwise affinities for the 8-neighborhoods around each pixel (instead of pairwise for all pixels with scikit-learn's rbf_kernel
method, which for the record doesn't fix this either).
Since the laplacian is unnormalized, I'm looking for the smallest eigenvalues and corresponding eigenvectors of the system. I understand that ARPACK is ill-suited for finding small eigenvalues, but I'm trying to use shift-invert to find these values and am still not having much success.
With the above arguments (specifically, sigma = 0
), I get the following error:
RuntimeError: Factor is exactly singular
With sigma = 0.001
, I get a different error:
scipy.sparse.linalg.eigen.arpack.arpack.ArpackNoConvergence: ARPACK error -1: No convergence (1001 iterations, 0/5 eigenvectors converged)
I've tried all three different values for mode
with the same result. Any suggestions for using the SciPy sparse library for finding small eigenvalues of a large system?