I am new to R and this is my first question on stackoverflow.
I am trying
Here is a quick and dirty way which doesn't require much thinking on your part, and captures the first viable option in the subset and leaves an NA
if non exists.
the do(f(.))
call evaluates the predefined function f
on each subset of dt
defined by the group_by
statement. I would go translate that simple script into Rcpp
for serious use.
library(dplyr)
f <- function(x){
x <- x %>% mutate(founddate = as.Date(NA))
for(i in 1:nrow(x)){
y <- x[i, "date_down"]
x[i, "founddate"] <-(x[-c(1:i),] %>% filter(code == "p", date_up > y) %>% select(date_up))[1, ]
}
return(x)
}
dt %>% group_by(id) %>% do(f(.))
# A tibble: 12 x 5
# Groups: id [6]
id code date_down date_up founddate
1 1 p 2019-01-01 2019-01-02 NA
2 1 f 2019-01-02 2019-01-03 NA
3 2 f 2019-01-02 2019-01-02 NA
4 2 p 2019-01-03 NA NA
5 3 p 2019-01-04 NA NA
6 4 2019-01-05 2019-01-05 NA
7 5 f 2019-01-07 2019-01-08 2019-01-08
8 5 p 2019-01-07 2019-01-08 2019-01-09
9 5 p 2019-01-09 2019-01-09 NA
10 6 f 2019-01-10 2019-01-10 2019-01-11
11 6 p 2019-01-10 2019-01-10 2019-01-11
12 6 p 2019-01-10 2019-01-11 NA
Your Comment about terrible performance is unsurprising. I would personal message this if I knew how, but below is a Rcpp::cppFunction
to do the same thing.
Rcpp::cppFunction('DataFrame fC(DataFrame x) {
int i, j;
int n = x.nrows();
CharacterVector code = x["code"];
DateVector date_up = x["date_up"];
DateVector date_down = x["date_down"];
DateVector founddate = rep(NA_REAL, n);
for(i = 0; i < n; i++){
for(j = i + 1; j < n; j++){
if(code(j) == "p"){
if(date_up(j) > date_down(i)){
founddate(i) = date_up(j);
break;
} else{
continue;
}
} else{
continue;
}
}
}
x.push_back(founddate, "founddate");
return x;
}')
dt %>% group_by(id) %>% do(fC(.))