Running multiple simple linear regressions from a nested dataframe/tibble

折月煮酒 提交于 2021-01-27 22:50:47

问题


I am trying to run multiple simple linear regressions based on data from a nested data frame and store the regression fit coefficients in a dataframe using tidy(). My code block is as follows

library(tidyverse)     
library(broom)
library(reshape2)
library(dplyr)

Factors <- as.factor(c("A","B","C","D"))
set.seed(5)
DF <- data.frame(Factors, X = rnorm(4), Y = rnorm(4), Z= rnorm(4))
MDF <- melt(DF, id.vars=c("Factors","X"))
DFF <- MDF %>% nest(-Factors)

If it is a single dataframe with many columns, I can do multiple simple linear regressions using

MDF %>% group_by(variable) %>% do(tidy(lm(value ~ X, data =.)))

or if it is a nested dataframe and I have to run one simple linear regression, I can try

MDF %>% nest(-Factors) 
%>% mutate(fit = map(data, ~lm(Y ~ X, data = .)), results = map(fit,tidy))
%>% unnest(results)

But What I need to do is a combination of both of the above cases. I need to run multiple simple linear regressions from data in nested dataframe.


回答1:


You could nest by both grouping variables:

MDF %>% nest(-Factors, -variable) %>% 
  mutate(fit = map(data, ~lm(value ~ X, data = .)), 
         results = map(fit,tidy)) %>% 
  unnest(results)

You could also use split and avoid nesting:

split(MDF, list(MDF$Factors, MDF$variable)) %>% 
  map_df(~ tidy(lm(value ~ X, data=.x)) %>% 
           mutate(Factors=.x$Factors[1],
                  variable=.x$variable[1]))

Or, if you don't mind the group identifiers in a single column:

split(MDF, list(MDF$Factors, MDF$variable), sep="_") %>% 
  map_df(~ tidy(lm(value ~ X, data=.x)), .id="Factors_variable")


来源:https://stackoverflow.com/questions/49121135/running-multiple-simple-linear-regressions-from-a-nested-dataframe-tibble

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!