R: repeat linear regression for all variables and save results in a new data frame

前端 未结 3 975
醉话见心
醉话见心 2021-01-28 17:03

I have a data frame named “dat” with 10 numeric variables (var1, var2,var3,var4 , var5,…var 10), each with several observations…

dat

   var1 var2 var3 var4 var         


        
3条回答
  •  清歌不尽
    2021-01-28 17:46

    dat <- structure(list(var1 = c(12L, 3L, 13L, 17L, 9L, 15L, 12L, 3L, 
    13L), var2 = c(5L, 2L, 15L, 11L, 13L, 6L, 5L, 2L, 15L), var3 = c(18L, 
    10L, 14L, 16L, 8L, 20L, 18L, 10L, 14L), var4 = c(19L, 6L, 13L, 
    18L, 8L, 17L, 19L, 6L, 13L), var5 = c(12L, 13L, 1L, 10L, 7L, 
    3L, 12L, 13L, 1L), var6 = c(17L, 17L, 17L, 17L, 17L, 17L, 17L, 
    17L, 17L), var7 = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L
    ), var8 = c(16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L), var9 = c(18L, 
    18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L), var10 = c(10L, 10L, 
    10L, 10L, 10L, 10L, 10L, 10L, 10L)), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9"))
    

    We first write a function to obtain all the statistics you need. Note, rsq is the square of the correlation coefficient. So you don't need the linear model. The coefficient you get from the model is the slope.

    STATS = function(x,y,DATA){
     COR = cor.test(DATA[,y],DATA[,x])
     MODEL = summary(lm(DATA[,y]~DATA[,x]))
     data.frame(
     VAR=x,
     PEARSON_COR=as.numeric(COR$estimate),
     PVAL=COR$p.value,
     RSQ=as.numeric(COR$estimate^2),
     SLOPE = MODEL$coefficients[2,1],
     stringsAsFactors=FALSE
     )
    }
    

    We test it on var2

    STATS("var2","var1",dat)
    
         VAR PEARSON_COR      PVAL      RSQ     SLOPE
    1 var2   0.5668721 0.1114741 0.321344 0.5251232
    

    We do it for example on var2,var3,var4 and combine them into a data frame. Note I did not try var 6 to 10 because it's only 1 value

    results = do.call(rbind,
    lapply(c("var2","var3","var4"),function(i)STATS(i,"var1",dat)))
    results
    
        VAR PEARSON_COR        PVAL       RSQ     SLOPE
    1 var2   0.5668721 0.111474101 0.3213440 0.5251232
    2 var3   0.7328421 0.024699805 0.5370575 0.8630573
    3 var4   0.8450726 0.004127542 0.7141477 0.7660377
    

    If you are familiar with tidyverse and purrr, you can do the following:

    library(dplyr)
    library(purrr)
    c("var2","var3","var4") %>% map_dfr(STATS,"var1",dat)
    

提交回复
热议问题