Faster way of finding least-square solution for large matrix

问题

I want to find the least-square solution of a matrix and I am using the numpy linalg.lstsq function;

weights = np.linalg.lstsq(semivariance, prediction, rcond=None)

The dimension for my variables are;

semivariance is a float of size 5030 x 5030

prediction is a 1D array of length 5030

The problem I have is it takes approximately 80sec to return the value of weights and I have to repeat the calculation of weights about 10000 times so the computational time is just elevated.

Is there a faster way/pythonic function to do this?

回答1:

@Brenlla appears to be right, even if you perform least squares by solving using the Moore-Penrose pseudo inverse, it is significantly faster than np.linalg.lstsq:

import numpy as np
import time

semivariance=np.random.uniform(0,100,[5030,5030]).astype(np.float64)
prediction=np.random.uniform(0,100,[5030,1]).astype(np.float64)

start=time.time()
weights_lstsq = np.linalg.lstsq(semivariance, prediction, rcond=None)
print("Took: {}".format(time.time()-start))

>>> Took: 34.65818190574646

start=time.time()
weights_pseudo = np.linalg.solve(semivariance.T.dot(semivariance),semivariance.T.dot(prediction))
print("Took: {}".format(time.time()-start))

>>> Took: 2.0434153079986572

np.allclose(weights_lstsq[0],weights_pseudo)

>>> True

The above is not on your exact matrices but the concept on the samples likely transfers. np.linalg.lstsq performs an optimisation problem by minimizing || b - a x ||^2 to solve for x in ax=b. This is usually faster on extremely large matrices, hence why linear models are often solved using gradient decent in neural networks, but in this case the matrices just aren't large enough for the performance benefit.

来源：https://stackoverflow.com/questions/50288513/faster-way-of-finding-least-square-solution-for-large-matrix

标签

python

python-3.x

numpy

scipy

least-squares