问题
I need to summarize in a grouped data_frame (warn: a solution with dplyr is very much appreciated but isn't mandatory) both something on each group (simple) and the same something on "other" groups.
minimal example
if(!require(pacman)) install.packages(pacman)
pacman::p_load(dplyr)
df <- data_frame(
group = c('a', 'a', 'b', 'b', 'c', 'c'),
value = c(1, 2, 3, 4, 5, 6)
)
res <- df %>%
group_by(group) %>%
summarize(
median = median(value)
# median_other = ... ??? ... # I need the median of all "other"
# groups
# median_before = ... ??? ... # I need the median of groups (e.g
# the "before" in alphabetic order,
# but clearly every roule which is
# a "selection function" depending
# on the actual group is fine)
)
my expected result is the following
group median median_other median_before
a 1.5 4.5 NA
b 3.5 3.5 1.5
c 5.5 2.5 2.5
I've searched on Google strings similar to "dplyr summarize excluding groups", "dplyr summarize other then group",I've searched on the dplyr documentation but I wasn't able to find a solution.
here, this (How to summarize value not matching the group using dplyr) does not apply because it runs only on sum, i.e. is a solution "function-specific" (and with a simple arithmetic function that did not consider the variability on each group). What about more complex function request (i.e. mean, sd, or user-function)? :-)
Thanks to all
PS: summarize()
is an example, the same question leads to mutate()
or other dplyr-functions working based on groups.
回答1:
Here's my solution:
res <- df %>%
group_by(group) %>%
summarise(med_group = median(value),
med_other = (median(df$value[df$group != group]))) %>%
mutate(med_before = lag(med_group))
> res
Source: local data frame [3 x 4]
group med_group med_other med_before
(chr) (dbl) (dbl) (dbl)
1 a 1.5 4.5 NA
2 b 3.5 3.5 1.5
3 c 5.5 2.5 3.5
I was trying to come up with an all-dplyr solution but base R subsetting works just fine with median(df$value[df$group != group])
returning the median of all observations that are not in the current group.
I hope this help you to solve your problem.
回答2:
I don't think it is in general possible to perform operations on other groups within summarise()
(i.e. I think the other groups are not "visible" when summarising a certain group). You can define your own functions and use them in mutate to apply them to a certain variable. For your updated example you can use
calc_med_other <- function(x) sapply(seq_along(x), function(i) median(x[-i]))
calc_med_before <- function(x) sapply(seq_along(x), function(i) ifelse(i == 1, NA, median(x[seq(i - 1)])))
df %>%
group_by(group) %>%
summarize(med = median(value)) %>%
mutate(
med_other = calc_med_other(med),
med_before = calc_med_before(med)
)
# group med med_other med_before
# (chr) (dbl) (dbl) (dbl)
#1 a 1.5 4.5 NA
#2 b 3.5 3.5 1.5
#3 c 5.5 2.5 2.5
来源:https://stackoverflow.com/questions/36450278/summarize-with-dplyr-other-then-groups