Output each factor level as dummy variable in stargazer summary statistics table

前端 未结 3 1050
情书的邮戳 2020-12-30 15:46

I\'m using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my

  • 2020-12-30 16:16

    Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table:

    fancy.summary <- Blackmoor %>%
      select(-subject) %>%  # Remove the subject column
      group_by(group) %>%  # Group by patient and control
      summarise_each(funs(mean, sd, min, max, length)) %>%  # Calculate summary statistics for each group
      mutate(prop = age_length / sum(age_length)) %>%  # Calculate proportion
      gather(variable, value, -group, -prop) %>%  # Convert to long
      separate(variable, c("variable", "statistic")) %>%  # Split variable column
      mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
      spread(statistic, value) %>%  # Make the statistics be actual columns
      select(group, variable, n, mean, sd, min, max, prop)  # Reorder columns

    Which results in this if you use pander:

     group   variable   n   mean   sd    min   max   prop 
    ------- ---------- --- ------ ----- ----- ----- ------
    control    age     359 11.26  2.698   8   17.92 0.3799
    control  exercise  359 1.641  1.813   0   11.54 0.3799
    patient    age     586 11.55  2.802   8   17.92 0.6201
    patient  exercise  586 3.076  4.113   0   29.96 0.6201
    0 讨论(0)
  • 2020-12-30 16:18

    The package tables can be useful for this task.

    # percent only:
    (x <- tabular((Factor(group, "") ) ~ (Pct=Percent()) * Format(digits=4), 
    ##         Pct  
    ## control 37.99
    ## patient 62.01
    # percent and counts:
    (x <- tabular((Factor(group, "") ) ~ ((n=1) + (Pct=Percent())) * Format(digits=4), 
    ##         n      Pct   
    ## control 359.00  37.99
    ## patient 586.00  62.01

    Then it's straightforward to output this to LaTeX:

    > latex(x)
      & n & \multicolumn{1}{c}{Pct} \\ 
    control  & $359.00$ & $\phantom{0}37.99$ \\
    patient  & $586.00$ & $\phantom{0}62.01$ \\
    0 讨论(0)
  • 2020-12-30 16:35

    Another workaround is to use model.matrix to create dummy variables in a separate step, and then use stargazer to create a table from that. To show this with the example:

    > library(car)
    > library(stargazer)
    > data(Blackmore)
    > options(na.action = "na.pass")  # so that we keep missing values in the data
    > X <- model.matrix(~ age + exercise + group - 1, data = Blackmore)
    > X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
    > names(X) <- colnames(X)
    > stargazer(X.df, type = "text")
    Statistic     N   Mean  St. Dev.  Min   Max  
    age          945 11.442  2.766   8.000 17.920
    exercise     945 2.531   3.495   0.000 29.960
    groupcontrol 945 0.380   0.486     0     1   
    grouppatient 945 0.620   0.486     0     1   

    Edit: car::Blackmoor has updated spelling to car::Blackmore.

    0 讨论(0)