So I have a data set that has 188 rows and 65 columns relating to World development indicators and Birth statistics. I am trying to do a purposeful selection method to create a
First, I don't recommend you doing this unless you know what you are doing. Else read about things like selection bias, false discovery rate, etc.
In the following, I am using the iris dataset, and regress the first three columns on the fourth one. You can easily change this to data you have.
Using the broom package isn't mandatory. If you don't want that, remove tidy`` command in the
lapply` function.
library(broom)
list_out <- lapply(colnames(iris)[1:3], function(i)
tidy(lm(as.formula(paste("Petal.Width ~", i)), data = iris)))
# [[1]]
# term estimate std.error statistic p.value
# 1 (Intercept) -3.2002150 0.25688579 -12.45773 8.141394e-25
# 2 Sepal.Length 0.7529176 0.04353017 17.29645 2.325498e-37
#
# [[2]]
# term estimate std.error statistic p.value
# 1 (Intercept) 3.1568723 0.4130820 7.642242 2.474053e-12
# 2 Sepal.Width -0.6402766 0.1337683 -4.786461 4.073229e-06
#
# [[3]]
# term estimate std.error statistic p.value
# 1 (Intercept) -0.3630755 0.039761990 -9.131221 4.699798e-16
# 2 Petal.Length 0.4157554 0.009582436 43.387237 4.675004e-86
Put them into a data.frame
do.call(rbind, list_out)
# term estimate std.error statistic p.value
# 1 (Intercept) -3.2002150 0.256885790 -12.457735 8.141394e-25
# 2 Sepal.Length 0.7529176 0.043530170 17.296454 2.325498e-37
# 3 (Intercept) 3.1568723 0.413081984 7.642242 2.474053e-12
# 4 Sepal.Width -0.6402766 0.133768277 -4.786461 4.073229e-06
# 5 (Intercept) -0.3630755 0.039761990 -9.131221 4.699798e-16
# 6 Petal.Length 0.4157554 0.009582436 43.387237 4.675004e-86