I have some data in a sql database and I\'d like to calculate the slope. The data has this layout:
Date | Keyword | Score
2012-01-10 | ipad
If you're defining slope as just the slope from the earliest point to the latest point, and if score only increases with date, then you can get the output above with this:
SELECT *
FROM scores
JOIN
(SELECT foo.keyword,
(MAX(score)-MIN(score)) / DATEDIFF(MAX(date),MIN(date)) AS score
FROM scores
GROUP BY keyword) a
USING(keyword);
However if you want linear regression, or if scores can decrease as well as increase with time, you'll need something more complex.
Use this
SUM(CONVERT(float, datediff(dd, '1/1/1900', date_field)))
instead of
SUM(CAST(date_field AS float))
The cleanest one I could make:
SELECT
Scores.Date, Scores.Keyword, Scores.Score,
(N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope
FROM Scores
INNER JOIN (
SELECT
Keyword,
COUNT(*) AS N,
SUM(CAST(Date as float)) AS Sum_X,
SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2,
SUM(Score) AS Sum_Y,
SUM(Score*Score) AS Sum_Y2,
SUM(CAST(Date as float) * Score) AS Sum_XY
FROM Scores
GROUP BY Keyword
) G ON G.Keyword = Scores.Keyword;
It uses Simple Linear Regression to calculate the slope.
Result:
Date Keyword Score Slope
2012-01-22 water bottle 0,010885442 0,000334784345222076
2012-01-23 water bottle 0,011203949 0,000334784345222076
2012-01-24 water bottle 0,008460835 0,000334784345222076
2012-01-25 water bottle 0,010363991 0,000334784345222076
2012-01-26 water bottle 0,011800716 0,000334784345222076
2012-01-27 water bottle 0,012948411 0,000334784345222076
2012-01-28 water bottle 0,012732459 0,000334784345222076
2012-01-29 water bottle 0,011682568 0,000334784345222076
Every database system seems to have a different approach to converting dates to numbers:
TO_SECONDS(date)
or TO_DAYS(date)
TO_NUMBER(TO_CHAR(date, 'J'))
or date - TO_DATE('1','yyyy')
CAST(date AS float)
(or equivalent CONVERT
)Cast to decimal does not give correct results for me, it is not linear to the dates. Use TO_DAYS(date_field)
instead, this becomes correct.