How do you calculate a best fit line in python, and then plot it on a scatterplot in matplotlib?
I was I calculate the linear best-fit line using Ordinary Least Squ
The accepted answer to this question provides a small multi poly fit library which will do exactly what you need using numpy, and you can plug the result into the plotting as I've outlined below.
You would just pass in your arrays of x and y points and the degree(order) of fit you require into multipolyfit
. This returns the coefficients which you can then use for plotting using numpy's polyval.
Note: The code below has been amended to do multivariate fitting, but the plot image was part of the earlier, non-multivariate answer.
import numpy
import matplotlib.pyplot as plt
import multipolyfit as mpf
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]
x, y = zip(*data)
plt.plot(x, y, 'kx')
stacked_x = numpy.array([x,x+1,x-1])
coeffs = mpf(stacked_x, y, deg)
x2 = numpy.arange(min(x)-1, max(x)+1, .01) #use more points for a smoother plot
y2 = numpy.polyval(coeffs, x2) #Evaluates the polynomial for each x2 value
plt.plot(x2, y2, label="deg=3")
Note: This was part of the answer earlier on, it is still relevant if you don't have multivariate data. Instead of coeffs = mpf(...
, use coeffs = numpy.polyfit(x,y,3)
For non-multivariate data sets, the easiest way to do this is probably with numpy's polyfit:
numpy.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False)
Least squares polynomial fit.
Fit a polynomial
p(x) = p[0] * x**deg + ... + p[deg]
of degreedeg
to points(x, y)
. Returns a vector of coefficients p that minimises the squared error.