Tabulate coefficients from lm

后端 未结 5 1616
失恋的感觉
失恋的感觉 2021-01-25 17:32

I have 10 linear models where I only need some information, namely: r-squared, p-value, coefficients of slope and intercept. I managed to extract these values (via ridiculously

5条回答
  •  迷失自我
    2021-01-25 18:02

    Here in only a couple of lines:

    library(tidyverse)
    library(broom)
    # create grouped dataframe:
    df_g <- df %>% group_by(CatChro)
    df_g %>% do(tidy(lm(Qend ~ Rainfall, data = .))) %>% 
       select(CatChro, term, estimate) %>% spread(term, estimate) %>% 
       left_join(df_g %>% do(glance(lm(Qend ~ Rainfall, data = .))) %>%
       select(CatChro, r.squared, adj.r.squared, p.value), by = "CatChro")
    

    And the result will be:

    # A tibble: 10 x 6
    # Groups:   CatChro [?]
       CatChro `(Intercept)` Rainfall r.squared adj.r.squared  p.value
                                        
     1 A3D1          0.0119  0.000409    0.281        0.254   0.00312 
     2 A3D2          0.0236  0.000543    0.0338       0.00543 0.283   
     3 A3D3          0.0221  0.000145    0.0429       0.00297 0.310   
     4 A3D4          0.00930 0.000661    0.372        0.350   0.000344
     5 A3D5          0.0143  0.000108    0.0441      -0.00899 0.374   
     6 A3G1          0.0244  0.000115    0.0363       0.0116  0.233   
     7 A3G2          0.0261  0.000458    0.0645       0.0411  0.105   
     8 A3G3          0.0435  0.000696    0.0759       0.0544  0.0670  
     9 A3G4          0.0237  0.000644    0.173        0.155   0.00324 
    10 A3G5          0.0260  0.000666    0.172        0.150   0.00774 
    

    So, how does this work?

    The following creates a dataframe with all coefficients and the corresponding statistics (tidy turns the result of lm into a dataframe):

    df_g %>%
      do(tidy(lm(Qend ~ Rainfall, data = .)))
    A tibble: 20 x 6
    Groups:   CatChro [10]
       CatChro term        estimate std.error statistic      p.value
                                      
     1 A3D1    (Intercept) 0.0119   0.00358       3.32  0.00258     
     2 A3D1    Rainfall    0.000409 0.000126      3.25  0.00312     
     3 A3D2    (Intercept) 0.0236   0.00928       2.54  0.0157      
     4 A3D2    Rainfall    0.000543 0.000498      1.09  0.283       
    

    I understand that you want to have the intercept and the coefficient on Rainfall as individual columns, so let's "spread" them out. This is achieved by first selecting the relevant columns, and then invoking tidyr::spread, as in

    select(CatChro, term, estimate) %>% spread(term, estimate)
    

    This gives you:

    df_g %>% do(tidy(lm(Qend ~ Rainfall, data = .))) %>% 
      select(CatChro, term, estimate) %>% spread(term, estimate)
    A tibble: 10 x 3
    Groups:   CatChro [10]
       CatChro `(Intercept)` Rainfall
                      
     1 A3D1          0.0119  0.000409
     2 A3D2          0.0236  0.000543
     3 A3D3          0.0221  0.000145
     4 A3D4          0.00930 0.000661
    

    Glance gives you the summary statistics you are looking for, for each model one. The models are indexed by group, here CatChro, so it is easy to just merge them onto the previous dataframe, which is what the rest of the code is about.

提交回复
热议问题