Calculating a linear trend line for every row of a table in R

怎甘沉沦 提交于 2019-12-06 08:33:02

问题


is it somehow possible to conduct a linear regression for every single row of a data frame without using a loop? The output (intercept + slope) of the trend line should be added to the original data frame as new columns.

To make my intention more clearly, I have prepared a very small data example:

day1 <- c(1,3,1)
day2 <- c(2,2,1)
day3 <- c(3,1,5)
output.intercept <- c(0,4,-1.66667)
output.slope <- c(1,-1,2)
data <- data.frame(day1,day2,day3,output.intercept,output.slope)

Input variables are day1-3; let's say those are the sales for different shops on 3 consecutive days. What I want to do is to calculate a linear trend line for the 3 rows and add the output parameters to the origin table (see output.intercept + output.slope) as new columns.

The solution should be very efficient in terms of calculation time since the real data frame has many 100k's of rows.

Best, Christoph


回答1:


design.mat <- cbind(1,1:3)
response.mat <- t(data[,1:3])

reg <- lm.fit(design.mat, response.mat)$coefficients
data <- cbind(data, t(reg))
#  day1 day2 day3 output.intercept output.slope        x1 x2
#1    1    2    3          0.00000            1  0.000000  1
#2    3    2    1          4.00000           -1  4.000000 -1
#3    1    1    5         -1.66667            2 -1.666667  2

However, if you have massive data, it might be necessary to loop due to memory restrictions. If that's the case I would use a long format data.table and use the package's by syntax to loop.




回答2:


Using your data,

day1 <- c(1,3,1)
day2 <- c(2,2,1)
day3 <- c(3,1,5)
output.intercept <- c(0,4,-1.66667)
output.slope <- c(1,-1,2)
dat <- data.frame(day1,day2,day3)

I think you want something like this:

fits <- lm.fit(cbind(1, seq_len(nrow(dat))), t(dat))
t(coef(fits))

Which gives

R> t(coef(fits))
         x1 x2
[1,]  0.000  1
[2,]  4.000 -1
[3,] -1.667  2

These can be added to dat like so

dat <- cbind(dat, t(coef(fits)))
names(dat)[-(1:3)] <- c("Intercept","Slope")

R> dat
  day1 day2 day3 Intercept Slope
1    1    2    3     0.000     1
2    3    2    1     4.000    -1
3    1    1    5    -1.667     2

It would perhaps be easier to store the data the other way, with columns as time series, rather than rows, if you have any control over the way the data are arranged initially as it would avoid having to transpose a large matrix when fitting via lm.fit(). Ideally, you;d want the data arranged like this initially:

     [,1] [,2] [,3]
day1    1    3    1
day2    2    2    1
day3    3    1    5

I.e. the rows as the time points rather than individual series as you have them now. This is because of the way R expects data to be arranged. Note we have to transpose your dat in the lm.fit() call which will entail a copy of a large object. Hence if you can control how these data are arranged/supplied before they get into R, that would help for the large problem.

lm.fit() is used as that is the underlying, lean code used by lm() but we avoid the complexities of parsing the formula and creating model matrices. If you want more efficient, you might have to look to doing the QR decomposition yourself (the code is in lm.fit() to do this) as there are a few things lm.fit() does as sanity checks that you might be able to do away with if you are certain your data won't lead to singular matrices etc.




回答3:


I had the same problem as OP. This solution will work on data with NAs. All of the previous answers generate an error for me in this case:

slp = function(x) {
  y = t(x)
  y = y[!is.na(y)]
  len = length(y):1
  b = cov(y,len)/var(len)
  return(b)}

reg_slp <- apply(data,1,slp)

Only gets the slope, but the intercept could be easily added. I doubt this is particularly efficient, but it was effective in my case.




回答4:


Or like this?

day1 <- c(1,3,1)
day2 <- c(2,2,1)
day3 <- c(3,1,5)
data <- data.frame(day1,day2,day3)
y<-1:3

reg<-apply(data,1,function(x) lm(as.numeric(x)~y))
data[,c("intercept","slope")]<-rbind(reg[[1]]$coef,reg[[2]]$coef,reg[[3]]$coef)


来源:https://stackoverflow.com/questions/21783871/calculating-a-linear-trend-line-for-every-row-of-a-table-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!