问题
This is an extension to post Collapse / concatenate / aggregate a column to a single comma separated string within each group
Goal: aggregate multiple columns according to one grouping variable and separate individual values by separator of choice.
Reproducible example:
data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200,3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))
data
A B C D E
1 111 100 1 15 1
2 111 100 2 16 NA
3 111 100 NA 17 NA
4 222 200 1 18 1
5 222 200 2 19 NA
6 222 200 NA 20 NA
A is the grouping variable but B is still displayed in overall result (B depends on A in my application) and C, D and E are the variables to be collapsed into separated character
strings.
Desired Output
A B C D E
1 111 100 1,2 15,16,17 1
2 222 100 1,2 18,19,20 1
I don't have a ton of experience with R. I did try to expand upon the solutions posted by G. Grothendieck to the linked post to meet my requirements but can't quite get it right for multiple columns.
What would be a proper implementation to get the desired output?
I focused specifically on group_by
and summarise_all
and aggregate
in my attempts. They are a complete mess so I don't believe it would even be helpful to display.
EDIT: Solutions posted work great at displaying desired result! To continue improving the value in this post for those that find it.
How would it be possible for users to select their own separation characters.
e.g. '-'
, '\n'
The current solutions by @akrun and @tmfmnk both result in lists instead of a concatenated character
string. Please correct me if I said this incorrectly.
data$D
[1] 15 16 17 18 19 20
> data$A
[1] 111 111 111 222 222 222
> data$B
[1] 100 100 100 200 200 200
> data$C
[1] 1 2 NA 1 2 NA
> data$D
[1] 15 16 17 18 19 20
> data$E
[1] 1 NA NA 1 NA NA
回答1:
We can group by 'A', 'B', and use summarise_at
to paste
all the non-NA elements
library(dplyr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups: A [2]
# A B C D E
# <dbl> <dbl> <chr> <chr> <chr>
#1 111 100 1, 2 15, 16, 17 1
#2 222 200 1, 2 18, 19, 20 1
If we need to pass custom delimiter, use paste
or str_c
library(stringr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))
Or using base R
with aggregate
aggregate(. ~ A + B, data, FUN = function(x)
toString(x[!is.na(x)]), na.action = NULL)
回答2:
With dplyr
, you can do:
data %>%
group_by(A, B) %>%
summarise_all(~ toString(na.omit(.)))
A B C D E
<dbl> <dbl> <chr> <chr> <chr>
1 111 100 1, 2 15, 16, 17 1
2 222 200 1, 2 18, 19, 20 1
来源:https://stackoverflow.com/questions/60233119/collapse-concatenate-aggregate-multiple-columns-to-a-single-comma-separated