问题
In the gt( ) package, the row_summary( ) function readily supports the calculation of the mean percentage per observation, but this is not the same as the overall percentage distribution. I've come up with a solution (below) which works, but only by adding the overall row percentages one column at a time. Is there a way of 'automating' the addition of these overall percentages?
library(dplyr)
library(gt)
# Create test data
set.seed(1)
df <- tibble(some_letter = sample(letters, size = 10, replace = FALSE),
num1 = sample(100:200, size = 10, replace = FALSE),
num2 = sample(100:200, size = 10, replace = FALSE),
n = num1 + num2) %>%
mutate(across(starts_with("num"), ~(.x)/(n), .names = "pct_{col}"))
# Use dplyr to calculate the correct overall totals and percentages [target]
df %>%
summarise_at(vars(num1, num2, n), funs(sum)) %>%
mutate(across(starts_with("num"), ~(.x)/(n), .names = "pct_{col}"))
# Create table in gt( ), using a separate call to row_summary for each percentage
gt(df) %>%
summary_rows(fns = list(TOTAL = "sum"), columns = vars(num1, num2, n)) %>%
summary_rows(fns = list(TOTAL = ~ sum(df$num1)/sum(df$n) ), columns = vars(pct_num1) ) %>%
summary_rows(fns = list(TOTAL = ~ sum(df$num2)/sum(df$n) ), columns = vars(pct_num2) )
回答1:
I feel the solution you propose is the right one. As you are using rowwise functions, you need to compute the summary result for each column. As a consequence, you are forced to use summary_rows
for each column (pct_num1, pct_num2). The great advantage of gt package is that you have a precise control on the values that appear in each cell of the summary rows. As a disadvantage, it looks pretty verbose.
In the code below, using a minimal example, I show the same problem. I do not define column n
to show the use of rowwise
function more clearly.
library(dplyr)
library(gt)
df_ex <- tribble(
~group, ~num1, ~num2,
"A", 4, 1,
"B", 5, 5
) %>%
rowwise() %>%
mutate(
across(starts_with("num"),
~ .x / sum(c_across(starts_with("num"))),
.names = "pct{col}")) %>%
ungroup()
df_ex
#> # A tibble: 2 x 5
#> group num1 num2 pctnum1 pctnum2
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 4 1 0.8 0.2
#> 2 B 5 5 0.5 0.5
These are the values that will appear in the summary row
df_ex %>%
summarise(num1 = sum(num1), num2 = sum(num2)) %>%
rowwise() %>%
mutate(pctnum1 = num1 / sum(c_across(starts_with("num"))),
pctnum2 = num2 / sum(c_across(starts_with("num"))))
#> # A tibble: 1 x 4
#> # Rowwise:
#> num1 num2 pctnum1 pctnum2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 9 6 0.6 0.4
As a solution to make the code more readable, in my opinion, you can define functions to compute the values that will appear in the summary rows. Nevertheless, this solution is the same as yours with a few cosmetics (rowwise use and external function definition of summary cells). Hope you find this useful.
compute_f1 <- function(x, df) {
sum(df$num1) / sum(df$num1+df$num2)
}
compute_f2 <- function(x, df) {
sum(df$num2) / sum(df$num1+df$num2)
}
df_ex %>%
gt %>%
summary_rows(fns = list(TOTAL = "sum"), columns = vars(num1, num2),
formatter = fmt_number, decimals = 0) %>%
summary_rows(fns = list(TOTAL = ~ compute_f1(.x, df_ex)), columns = vars(pctnum1),
formatter = fmt_number, decimals = 1) %>%
summary_rows(fns = list(TOTAL = ~ compute_f2(.x, df_ex)), columns = vars(pctnum2),
formatter = fmt_number, decimals = 1)
Created on 2020-11-14 by the reprex package (v0.3.0)
来源:https://stackoverflow.com/questions/63039692/how-can-you-automate-the-addition-of-overall-percentages-to-the-row-summary-in-t