问题
I have a data frame like the following:
Date Flare Painmed_Use
1 2015-12-01 0 0
2 2015-12-02 0 0
3 2015-12-03 0 0
4 2015-12-04 0 0
5 2015-12-05 0 0
6 2015-12-06 0 1
7 2015-12-07 1 4
8 2015-12-08 1 3
9 2015-12-09 1 1
10 2015-12-10 1 0
11 2015-12-11 0 0
12 2015-12-12 0 0
13 2015-12-13 1 2
14 2015-12-14 1 3
15 2015-12-15 1 1
16 2015-12-16 0 0
I'm trying to find the length of each flare as well as the total med use during each flare using dplyr. My current solution (inspired by Use rle to group by runs when using dplyr),
df %>%
group_by(yy = {yy = rle(Flare); rep(seq_along(yy$lengths), yy$lengths)}, Flare) %>%
summarize(Painmed_UseCum = sum(Painmed_Use),FlareLength = n())
gives the following output:
yy Flare Painmed_UseCum FlareLength
<int> <int> <dbl> <int>
1 1 0 1 6
2 2 1 8 4
3 3 0 0 2
4 4 1 6 3
5 5 0 0 1
This is almost exactly what I need. However, I can't figure out how to preserve other columns, the critical one being the date that corresponds to the last row of a particular flare. So, the output I'm seeking is the same as above but with the addition of the Dates, like so:
Date yy Flare Painmed_UseCum FlareLength
<int> <int> <dbl> <int>
1 2015-12-06 1 0 1 6
2 2015-12-10 2 1 8 4
3 2015-12-12 3 0 0 2
4 2015-12-15 4 1 6 3
5 2015-12-16 5 0 0 1
Note: In some ways this is a follow up from a previous question of mine (R code to get max count of time series data by group) but my attempt to keep that question simpler, though perhaps useful to others, ended up necessitating this further question.
回答1:
You could either include Date in summarise
library(dplyr)
df %>%
group_by(yy = {yy = rle(Flare); rep(seq_along(yy$lengths),yy$lengths)}) %>%
summarize(Painmed_UseCum = sum(Painmed_Use),FlareLength = n(), Date = max(Date))
# Groups: yy, Flare [5]
# Date Flare Painmed_Use yy
# <date> <int> <int> <int>
#1 2015-12-06 0 1 1
#2 2015-12-10 1 0 2
#3 2015-12-12 0 0 3
#4 2015-12-15 1 1 4
#5 2015-12-16 0 0 5
Or if there are more columns to preserve better approach is to use mutate
and select the last row in each group.
df %>%
group_by(yy = {yy = rle(Flare); rep(seq_along(yy$lengths), yy$lengths)}) %>%
mutate(Painmed_UseCum = sum(Painmed_Use),FlareLength = n()) %>%
slice(n())
To create groups, we can replace rle
with rleid
from data.table
which would be simpler.
group_by(yy = data.table::rleid(Flare))
来源:https://stackoverflow.com/questions/59978973/use-dplyr-to-summarize-but-preserve-date-of-group-row