Linear Regression and group by in R

前端 未结 10 1256
抹茶落季
抹茶落季 2020-11-22 02:27

I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 sta

相关标签:
10条回答
  • 2020-11-22 02:55

    Here's an approach using the plyr package:

    d <- data.frame(
      state = rep(c('NY', 'CA'), 10),
      year = rep(1:10, 2),
      response= rnorm(20)
    )
    
    library(plyr)
    # Break up d by state, then fit the specified model to each piece and
    # return a list
    models <- dlply(d, "state", function(df) 
      lm(response ~ year, data = df))
    
    # Apply coef to each model and return a data frame
    ldply(models, coef)
    
    # Print the summary of each model
    l_ply(models, summary, .print = TRUE)
    
    0 讨论(0)
  • 2020-11-22 02:55

    In my opinion is a mixed linear model a better approach for this kind of data. The code below given in the fixed effect the overall trend. The random effects indicate how the trend for each individual state differ from the global trend. The correlation structure takes the temporal autocorrelation into account. Have a look at Pinheiro & Bates (Mixed Effects Models in S and S-Plus).

    library(nlme)
    lme(response ~ year, random = ~year|state, correlation = corAR1(~year))
    
    0 讨论(0)
  • 2020-11-22 02:55

    The lm() function above is an simple example. By the way, I imagine that your database has the columns as in the following form:

    year state var1 var2 y...

    In my point of view, you can to use the following code:

    require(base) 
    library(base) 
    attach(data) # data = your data base
                 #state is your label for the states column
    modell<-by(data, data$state, function(data) lm(y~I(1/var1)+I(1/var2)))
    summary(modell)
    
    0 讨论(0)
  • 2020-11-22 02:56

    A nice solution using data.table was posted here in CrossValidated by @Zach. I'd just add that it is possible to obtain iteratively also the regression coefficient r^2:

    ## make fake data
        library(data.table)
        set.seed(1)
        dat <- data.table(x=runif(100), y=runif(100), grp=rep(1:2,50))
    
    ##calculate the regression coefficient r^2
        dat[,summary(lm(y~x))$r.squared,by=grp]
           grp         V1
        1:   1 0.01465726
        2:   2 0.02256595
    

    as well as all the other output from summary(lm):

    dat[,list(r2=summary(lm(y~x))$r.squared , f=summary(lm(y~x))$fstatistic[1] ),by=grp]
       grp         r2        f
    1:   1 0.01465726 0.714014
    2:   2 0.02256595 1.108173
    
    0 讨论(0)
提交回复
热议问题