How to create count and percentage tables and linegraphs with 1 independent variable and 3 dependent ones

问题

I'm an R neophyte, and somehow this problem seems like it should be trivial to solve. But unfortunately, I haven't been able to do so after about three days of searching and experimenting.

My data is in a form close to wideform:

color   agegroup    sex     ses
red     2           Female  A
blue    2           Female  C
green   5           Male    D
red     3           Female  A
red     2           Male    B
blue    1           Female  B
...

I'm trying to create presentable tables with counts and percentages of the dependent variable (color here) organized by sex, ses and agegroup. I need one table organized by ses and sex for each agegroup, with counts next to the percentages, like this:

agegroup:                                  1
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

agegroup:                               2
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

I've been trying to do this with everything from datatables and expss to gmodels, but I just can't figure out how to get output like this. CrossTables from gmodels comes closest, but it's still pretty far away -- (1) it puts percentages under counts, (2) I can't get it to nest sel under sex, (3) I can't figure out how to get it to disgregate the results by generation, and (4) the output is full of dashes, vertical pipes and spaces which make putting it into a word processor or spreadsheet an error-prone manual affair.

EDIT: I removed my second question (about line plots), because the answer to the first question is perfect and deserves credit, even if it doesn't touch on the second one. I'll ask the second question separately, as I should have from the start.

回答1:

The closest result with expss package:

library(expss)
# generate example data
set.seed(123)
N = 300
df = data.frame(
    color = sample(c("red", "blue", "green"), size = N, replace = TRUE),
    agegroup = sample(1:5, size = N, replace = TRUE),
    sex = sample(c("Male", "Female"), size = N, replace = TRUE),
    ses = sample(c("A", "B", "C", "D"),  size = N, replace = TRUE),
    stringsAsFactors = FALSE
)

# redirect output to RStudio HTML viewer
expss_output_viewer()
res = df %>% 
    tab_cells("|" = color) %>% # dependent variable, "|" used to suppress label
    tab_cols(sex %nest% ses) %>% # column variable
    tab_rows(agegroup) %>% 
    tab_total_row_position("none") %>% # we don't need total
    tab_stat_cases(label = "Cases") %>% # calculate cases
    tab_stat_cpct(label = "%") %>% # calculate percent
    tab_pivot(stat_position = "inside_columns") %>% # finalize table
    make_subheadings(number_of_columns = 2)

# difficult part - add percent sign
for(i in grep("%", colnames(res))){
    res[[i]] = ifelse(trimws(res[[i]])!="", 
                      paste0(round(res[[i]], 1), "%"),
                      res[[i]] 
                      )
}

# additionlly remove stat labels
colnames(res) = gsub("\\|Cases|%", "", colnames(res), perl = TRUE)

res

In the RStudio Viewer result will be in the HTML format (see image). Unfortunately, I can't test how it will be pasted to the MS Word. Disclaimer: I am an author of expss package.

来源：https://stackoverflow.com/questions/52340905/how-to-create-count-and-percentage-tables-and-linegraphs-with-1-independent-vari

标签

expss