Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ?
To add to @icc97 answer, I have included the weighted versions for the slope and the intercept. If the values are all constant the slope will be NULL (with the appropriate settings SET ARITHABORT OFF; SET ANSI_WARNINGS OFF;
) and will need to be substituted for 0 via coalesce().
Here is a solution written in SQL:
with d as (select segment,w,x,y from somedatasource)
select segment,
avg(y) - avg(x) *
((count(*) * sum(x*y)) - (sum(x)*sum(y)))/
((count(*) * sum(x*x)) - (Sum(x)*Sum(x))) as intercept,
((count(*) * sum(x*y)) - (sum(x)*sum(y)))/
((count(*) * sum(x*x)) - (sum(x)*sum(x))) AS slope,
avg(y) - ((avg(x*y) - avg(x)*avg(y))/var_samp(X)) * avg(x) as interceptUnstable,
(avg(x*y) - avg(x)*avg(y))/var_samp(X) as slopeUnstable,
(Avg(x * y) - Avg(x) * Avg(y)) / (stddev_pop(x) * stddev_pop(y)) as correlationUnstable,
(sum(y*w)/sum(w)) - (sum(w*x)/sum(w)) *
((sum(w)*sum(x*y*w)) - (sum(x*w)*sum(y*w)))/
((sum(w)*sum(x*x*w)) - (sum(x*w)*sum(x*w))) as wIntercept,
((sum(w)*sum(x*y*w)) - (sum(x*w)*sum(y*w)))/
((sum(w)*sum(x*x*w)) - (sum(x*w)*sum(x*w))) as wSlope,
(count(*) * sum(x * y) - sum(x) * sum(y)) / (sqrt(count(*) * sum(x * x) - sum(x) * sum(x))
* sqrt(count(*) * sum(y * y) - sum(y) * sum(y))) as correlation,
(sum(w) * sum(x*y*w) - sum(x*w) * sum(y*w)) /
(sqrt(sum(w) * sum(x*x*w) - sum(x*w) * sum(x*w)) * sqrt(sum(w) * sum(y*y*w)
- sum(y*w) * sum(y*w))) as wCorrelation,
count(*) as n
from d where x is not null and y is not null group by segment
Where w is the weight. I double checked this against R to confirm the results. One may need to cast the data from somedatasource to floating point. I included the unstable versions to warn you against those. (Special thanks goes to Stephan in another answer.)
Update: added weighted correlation