问题
I have a data set that looks like this
Id Subject Date Vitals Value
10 John 2001-05-29 HeartRate 65
10 John 2001-05-29 HeartRate 68
10 John 2001-05-29 BP-Arterial 48
10 John 2001-05-29 PulseRate 64
34 Pete 2005-08-15 HeartRate 68
34 Pete 2005-08-15 BP-Arterial 56
10 John 2004-09-25 HeartRate 65
10 John 2004-09-25 BP-Arterial 64
10 John 2004-09-25 PulseRate 63
34 Pete 2007-07-21 BP-Arterial 68
34 Pete 2007-07-21 PulseRate 56
I want to do two things,
1) group by Vitals.
2) Count the number of Vitals that were measured for each ID on a specific date (ID + Date)
and collapse and paste these values like this below.
Vitals Series
HeartRate 2,1,1
BP-Arterial 1,1,1,1
PulseRate 1,1,1
The value under the Series column for HeartRate
is 2, 1, 1
because HeartRate was measured
twice for ID 10 on 2001-05-29,
once for ID 34 on 2005-08-15 and
once for ID 10 on 2004-09-24
Not sure how to collapse and paste these values using dplyr any help is much appreciated.
回答1:
Count the number of Vitals that were measured for each ID on a specific date (ID + Date)
This means you need to group by all three. We can then regroup by only vitals for the final collapse:
dat %>% group_by(Vitals, Id, Date) %>%
summarize(n = n()) %>%
ungroup() %>%
group_by(Vitals) %>%
summarize(Series = paste(n, collapse = ','))
# # A tibble: 3 × 2
# Vitals Series
# <fctr> <chr>
# 1 BP-Arterial 1,1,1,1
# 2 HeartRate 2,1,1
# 3 PulseRate 1,1,1
回答2:
With dplyr
and rle
i.e run-length encoding see ?rle
for more details
library(dplyr)
newDF = DF %>%
group_by(Id,Date) %>%
do(.,data.frame(Series=paste(rle(.$Vitals)$lengths,collapse=","),stringsAsFactors=FALSE)) %>%
as.data.frame()
newDF
# Id Date Series
#1 10 2001-05-29 2,1,1
#2 10 2004-09-25 1,1,1
#3 34 2005-08-15 1,1
#4 34 2007-07-21 1,1
来源:https://stackoverflow.com/questions/40143046/r-dplyr-group-by-values-collapse-and-paste