Summary Statistics table with factors and continuous variables

萝らか妹 提交于 2020-06-27 03:55:07

问题


I am trying to create a simple summary statistics table (min, max, mean, n, etc) that handles both factor variables and continuous variables, even when there is more than one factor variable. I'm trying to produce good looking HTML output, eg stargazer or huxtable output.

For a simple reproducible example, I'll use mtcars but change two of the variables to factors, and simplify to three variables.

library(tidyverse)
library(stargazer)

mtcars_df <- mtcars
mtcars_df <- mtcars_df %>% 
  mutate(vs = factor(vs),
         am = factor(am)) %>% 
  select(mpg, vs, am)
head(mtcars_df)

So the data has two factor variables, vs and am. mpg is left as a double:

#>    mpg vs am
#>  <dbl> <fctr> <fctr>
#> 1 21.0  0  1
#> 2 21.0  0  1
#> 3 22.8  1  1
#> 4 21.4  1  0
#> 5 18.7  0  0
#> 6 18.1  1  0

My desired output would look something like this (format only, the numbers aren't all correct for am0):

======================================================
Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs0       32 0.562   0.504    0     0        1      1 
vs1       32 0.438   0.504    0     0        1      1 
am0       32 0.594   0.499    0     0        1      1 
am1       32 0.406   0.499    0     0        1      1 
------------------------------------------------------

A straight call to stargazer does not handle factors (but we have a solution for summarising one factor, below)

# this doesn't give factors
stargazer(mtcars_df, type = "text")
======================================================
Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg       32 20.091  6.027   10    15.4     22.8   34 
------------------------------------------------------

This previous answer from @jake-fisher works very well to summarise one factor variable. https://stackoverflow.com/a/26935270/8742237

The code below from the previous answer gives both values of the first factor vs, i.e. vs0 and vs1 but when it comes to the second factor, am, it only lists summary statistics for one value of am:

  • am0 is missing.

I do realise that this is because we want to avoid the dummy variable trap when modeling, but my issue is not about modeling, it's about creating a summary table with all values of all factor variables.

options(na.action = "na.pass")  # so that we keep missing values in the data
X <- model.matrix(~ . - 1, data = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

======================================================
Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs0       32 0.562   0.504    0     0        1      1 
vs1       32 0.438   0.504    0     0        1      1 
am1       32 0.406   0.499    0     0        1      1 
------------------------------------------------------

While use of stargazer or huxtable would be preferred, if there's an easier way to produce this sort of summary table with a different library, that would still be very helpful.


回答1:


In the end, instead of using model.matrix(), which is designed to drop the base case when creating dummy variables, a simple fix is to use mlr::createDummyFeatures(), which creates a Dummy for all values, even the base case.

library(tidyverse)
library(stargazer)
library(mlr)

mtcars_df <- mtcars
mtcars_df <- mtcars_df %>% 
  mutate(vs = factor(vs),
         am = factor(am)) %>% 
  select(mpg, vs, am)
head(mtcars_df)


X <- mlr::createDummyFeatures(obj = mtcars_df)
X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")

which does give the desired output:

======================================================
Statistic N   Mean  St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg       32 20.091  6.027   10    15.4     22.8   34 
vs.0      32 0.562   0.504    0     0        1      1 
vs.1      32 0.438   0.504    0     0        1      1 
am.0      32 0.594   0.499    0     0        1      1 
am.1      32 0.406   0.499    0     0        1      1 
------------------------------------------------------


来源:https://stackoverflow.com/questions/62315073/summary-statistics-table-with-factors-and-continuous-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!