问题
I'm an R neophyte, and somehow this problem seems like it should be trivial to solve. But unfortunately, I haven't been able to do so after about three days of searching and experimenting.
My data is in a form close to wideform:
color agegroup sex ses
red 2 Female A
blue 2 Female C
green 5 Male D
red 3 Female A
red 2 Male B
blue 1 Female B
...
I'm trying to create presentable tables with counts and percentages of the dependent variable (color
here) organized by sex
, ses
and agegroup
. I need one table organized by ses
and sex
for each agegroup
, with counts next to the percentages, like this:
agegroup: 1
sex: Female Male
ses: A B C D A B C D
color:
red 2 1% 0 0% 8 4% 22 11% 16 8% 2 1% 8 4% 3 1.5%
blue 9 4.5% 6 3% 4 2% 2 1% 12 6% 32 16% 14 7% 6 3%
green 4 2% 12 6% 2 1% 8 4% 0 0% 22 11% 40 20% 0 0%
agegroup: 2
sex: Female Male
ses: A B C D A B C D
color:
red 2 1% 0 0% 8 4% 22 11% 16 8% 2 1% 8 4% 3 1.5%
blue 9 4.5% 6 3% 4 2% 2 1% 12 6% 32 16% 14 7% 6 3%
green 4 2% 12 6% 2 1% 8 4% 0 0% 22 11% 40 20% 0 0%
I've been trying to do this with everything from datatables
and expss
to gmodels
, but I just can't figure out how to get output like this. CrossTables
from gmodels
comes closest, but it's still pretty far away -- (1) it puts percentages under counts, (2) I can't get it to nest sel
under sex
, (3) I can't figure out how to get it to disgregate the results by generation, and (4) the output is full of dashes, vertical pipes and spaces which make putting it into a word processor or spreadsheet an error-prone manual affair.
EDIT: I removed my second question (about line plots), because the answer to the first question is perfect and deserves credit, even if it doesn't touch on the second one. I'll ask the second question separately, as I should have from the start.
回答1:
The closest result with expss
package:
library(expss)
# generate example data
set.seed(123)
N = 300
df = data.frame(
color = sample(c("red", "blue", "green"), size = N, replace = TRUE),
agegroup = sample(1:5, size = N, replace = TRUE),
sex = sample(c("Male", "Female"), size = N, replace = TRUE),
ses = sample(c("A", "B", "C", "D"), size = N, replace = TRUE),
stringsAsFactors = FALSE
)
# redirect output to RStudio HTML viewer
expss_output_viewer()
res = df %>%
tab_cells("|" = color) %>% # dependent variable, "|" used to suppress label
tab_cols(sex %nest% ses) %>% # column variable
tab_rows(agegroup) %>%
tab_total_row_position("none") %>% # we don't need total
tab_stat_cases(label = "Cases") %>% # calculate cases
tab_stat_cpct(label = "%") %>% # calculate percent
tab_pivot(stat_position = "inside_columns") %>% # finalize table
make_subheadings(number_of_columns = 2)
# difficult part - add percent sign
for(i in grep("%", colnames(res))){
res[[i]] = ifelse(trimws(res[[i]])!="",
paste0(round(res[[i]], 1), "%"),
res[[i]]
)
}
# additionlly remove stat labels
colnames(res) = gsub("\\|Cases|%", "", colnames(res), perl = TRUE)
res
In the RStudio Viewer result will be in the HTML format (see image). Unfortunately, I can't test how it will be pasted to the MS Word.
Disclaimer: I am an author of expss
package.
来源:https://stackoverflow.com/questions/52340905/how-to-create-count-and-percentage-tables-and-linegraphs-with-1-independent-vari