I have 2 arrays of equal length. The following function attempts to calculate the slope using these arrays. It returns the average of the slope between each points. For the foll
I bet the other two methods are computing the least-squares fit, whereas you are not.
When I verify this conjecture using R, I too get the slope of about 0.755:
> summary(lm(y~x))
Call:
lm(formula = y ~ x)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.265e+03 1.793e+02 -7.053 5.97e-05 ***
x 7.551e-01 9.155e-02 8.247 1.73e-05 ***
The relevant number is the 7.551e-01
. It is also worth noting that the line has an intercept of about -1265.
Here is a picture of the least-squares fit:
As to implementing this in your code, see Compute least squares using java
You should be dividing by x_values.length - 1
. Number of slopes is pairwise.
Edit : Wiki example in my comments shows how to calculate the alpha and beta which determines the slope of the linear regression line.
Edit: use Apache Commons Math class SimpleRegression if that's an option. Else, here's a method that calculates slope and also intercept, should yield the same results as excel and apache:
private static double intercept(List<Double> yList, List<Double> xList) {
if (yList.size() != xList.size())
throw new IllegalArgumentException("Number of y and x must be the same");
if (yList.size() < 2)
throw new IllegalArgumentException("Need at least 2 y, x");
double yAvg = average(yList);
double xAvg = average(xList);
double sumNumerator = 0d;
double sumDenominator = 0d;
for (int i = 0; i < yList.size(); i++) {
double y = yList.get(i);
double x = xList.get(i);
double yDiff = y - yAvg;
double xDiff = x - xAvg;
double numerator = xDiff * yDiff;
double denominator = xDiff * xDiff;
sumNumerator += numerator;
sumDenominator += denominator;
}
double slope = sumNumerator / sumDenominator;
double intercept = yAvg - (slope * xAvg);
return intercept;
}
private static double average(Collection<Double> doubles) {
return doubles.stream().collect(Collectors.averagingDouble(d -> d));
}
Sources: Excel doc for SLOPE Excel doc for INTERCEPT
This function will not help you much, as it does not take into account the breadths of the various line segments. Consider the differences in applying it to the points (0,0), (1000,1000), and (1001, 2000) versus (0,0), (1,1), and (2, 1001). Both cases have successive slopes 1 and 1000, yet they look greatly different.
You need to implement the method of least squares: http://en.wikipedia.org/wiki/Least_squares to find the line that best approximates your data set.
One more piece of advice: never throw a java.lang.Exception
. Always choose a more-specific exception, even if you must write the class yourself. People using your code will need to handle java.lang.Exception
, which interferes badly with their other code.