问题
I have the following data.frame
:
df <- data.frame(id=c(1,2,3),
first.date=as.Date(c("2014-01-01", "2014-03-01", "2014-06-01")),
second.date=as.Date(c("2015-01-01", "2015-03-01", "2015-06-1")),
third.date=as.Date(c("2016-01-01", "2017-03-01", "2018-06-1")),
fourth.date=as.Date(c("2017-01-01", "2018-03-01", "2019-06-1")))
> df
id first.date second.date third.date fourth.date
1 1 2014-01-01 2015-01-01 2016-01-01 2017-01-01
2 2 2014-03-01 2015-03-01 2017-03-01 2018-03-01
3 3 2014-06-01 2015-06-01 2018-06-01 2019-06-01
Each row represents three timespans; i.e. the time spans between first.date
and second.date
, second.date
and third.date
, and third.date
and fourth.date
respectively.
I would like to, in lack of a better word, unnest the dataframe to obtain this instead:
id StartDate EndDate
1 1 2014-01-01 2015-01-01
2 1 2015-01-01 2016-01-01
3 1 2016-01-01 2017-01-01
4 2 2014-03-01 2015-03-01
5 2 2015-03-01 2017-03-01
6 2 2017-03-01 2018-03-01
7 3 2014-06-01 2015-06-01
8 3 2015-06-01 2018-06-01
9 3 2018-06-01 2019-06-01
I have been playing around with the unnest
function from the tidyr
package, but I came to the conclusion that I don't think it's what I'm really looking for.
Any suggestions?
回答1:
You can try tidyr/dplyr as follows:
library(tidyr)
library(dplyr)
df %>% gather(DateType, StartDate, -id) %>% select(-DateType) %>% arrange(id) %>% group_by(id) %>% mutate(EndDate = lead(StartDate))
You can eliminate the last row in each id group by adding:
%>% slice(-4)
To the above pipeline.
回答2:
We can use data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), then melt
the dataset to long
format, use shift
with type='lead'
grouped by 'id' and then remove the NA
elements.
library(data.table)
na.omit(melt(setDT(df), id.var='id')[, shift(value,0:1, type='lead') , id])
# id V1 V2
#1: 1 2014-01-01 2015-01-01
#2: 1 2015-01-01 2016-01-01
#3: 1 2016-01-01 2017-01-01
#4: 2 2014-03-01 2015-03-01
#5: 2 2015-03-01 2017-03-01
#6: 2 2017-03-01 2018-03-01
#7: 3 2014-06-01 2015-06-01
#8: 3 2015-06-01 2018-06-01
#9: 3 2018-06-01 2019-06-01
The column names can be changed by using either setnames
or earlier in the shift
step.
来源:https://stackoverflow.com/questions/34662114/unnesting-a-dataframe-in-r