Running multiple, simple linear regressions from dataframe in R

前端 未结 3 687
Happy的楠姐
Happy的楠姐 2021-02-06 12:04

I have a dataset (data frame) with 5 columns all containing numeric values.

I\'m looking to run a simple linear regression for each pair in the dataset.

For ex

相关标签:
3条回答
  • 2021-02-06 12:48

    Here's one solution using combn

     combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)
    

    Example:

    set.seed(1)
    DF <- data.frame(A=rnorm(50, 100, 3),
                     B=rnorm(50, 100, 3),
                     C=rnorm(50, 100, 3),
                     D=rnorm(50, 100, 3),
                     E=rnorm(50, 100, 3))
    

    Updated: adding @Henrik suggestion (see comments)

    # only the coefficients
    > results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
    > vars <- combn(names(DF), 2)
    > names(results) <- vars[1 , ] # adding names to identify variables in the reggression
    > results
    $A
     (Intercept)            B 
    103.66739418  -0.03354243 
    
    $A
    (Intercept)           C 
    97.88341555  0.02429041 
    
    $A
    (Intercept)           D 
    122.7606103  -0.2240759 
    
    $A
    (Intercept)           E 
    99.26387487  0.01038445 
    
    $B
     (Intercept)            C 
    99.971253525  0.003824755 
    
    $B
     (Intercept)            D 
    102.65399702  -0.02296721 
    
    $B
    (Intercept)           E 
    96.83042199  0.03524868 
    
    $C
    (Intercept)           D 
     80.1872211   0.1931079 
    
    $C
    (Intercept)           E 
     89.0503893   0.1050202 
    
    $D
     (Intercept)            E 
    107.84384655  -0.07620397 
    
    0 讨论(0)
  • 2021-02-06 12:52

    I would recommend to also look at the correlation matrix (cor(DF)), which is usually the best way to discover linear relationships between variables. The correlation is tightly linked to the covariance and the slopes of a simple linear regression. The computation below exemplifies this link.

    Sample data:

    set.seed(1)
    DF <- data.frame(
      A=rnorm(50, 100, 3),
      B=rnorm(50, 100, 3),
      C=rnorm(50, 100, 3),
      D=rnorm(50, 100, 3),
      E=rnorm(50, 100, 3)
    )
    

    The regression slope is cov(x, y) / var(x)

    beta = cov(DF) * (1/diag(var(DF)))
    
                A            B           C           D           E
    A  1.00000000 -0.045548503 0.028448192 -0.32982367  0.01800795
    B -0.03354243  1.000000000 0.003298708 -0.02489518  0.04501362
    C  0.02429041  0.003824755 1.000000000  0.24269838  0.15550116
    D -0.22407592 -0.022967212 0.193107904  1.00000000 -0.08977834
    E  0.01038445  0.035248685 0.105020194 -0.07620397  1.00000000
    

    The intercept is mean(y) - beta * mean(x)

    colMeans(DF) - beta * colMeans(DF)
    
                 A         B         C         D         E
    A 1.421085e-14 104.86992  97.44795 133.38310  98.49512
    B 1.037180e+02   0.00000 100.02095 102.85026  95.83477
    C 9.712461e+01  99.16182   0.00000  75.38373  84.06356
    D 1.226899e+02 102.53263  80.87529   0.00000 109.22915
    E 9.886859e+01  96.38451  89.41391 107.51930   0.00000
    
    0 讨论(0)
  • 2021-02-06 13:04

    Using combn for all combination of names of column (in the following example I assumed you want combination of two columns only) and Map for running over loops.

    Example using mtcars data from R:

    colc<-names(mtcars)
    colcc<-combn(colc,2)
    colcc<-data.frame(colcc)
    kk<-Map(function(x)lm(as.formula(paste(colcc[1,x],"~",paste(colcc[2,x],collapse="+"))),data=mtcars), as.list(1:nrow(colcc)))
    
     head(kk)
    [[1]]
    
    Call:
    lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
        x], collapse = "+"))), data = mtcars)
    
    Coefficients:
    (Intercept)          cyl  
         37.885       -2.876  
    
    
    [[2]]
    
    Call:
    lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
        x], collapse = "+"))), data = mtcars)
    
    Coefficients:
    (Intercept)         disp  
       29.59985     -0.04122  
    
    
    [[3]]
    
    Call:
    lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
        x], collapse = "+"))), data = mtcars)
    
    Coefficients:
    (Intercept)           hp  
       30.09886     -0.06823  
    
    
    [[4]]
    
    Call:
    lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
        x], collapse = "+"))), data = mtcars)
    
    Coefficients:
    (Intercept)         drat  
         -7.525        7.678  
    
    
    [[5]]
    
    Call:
    lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
        x], collapse = "+"))), data = mtcars)
    
    Coefficients:
    (Intercept)           wt  
         37.285       -5.344  
    
    
    [[6]]
    
    Call:
    lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
        x], collapse = "+"))), data = mtcars)
    
    Coefficients:
    (Intercept)         qsec  
         -5.114        1.412  
    
    0 讨论(0)
提交回复
热议问题