Set column value using the first next row in the same group that meets a condition

前端 未结 3 842
予麋鹿
予麋鹿 2021-02-08 10:26

I am new to R and this is my first question on stackoverflow.

I am trying

  • to assign by reference to a new column
  • for each row
  • using the v
3条回答
  •  無奈伤痛
    2021-02-08 10:50

    Here is a quick and dirty way which doesn't require much thinking on your part, and captures the first viable option in the subset and leaves an NA if non exists.

    the do(f(.)) call evaluates the predefined function f on each subset of dt defined by the group_by statement. I would go translate that simple script into Rcpp for serious use.

    library(dplyr)
    f <- function(x){
      x <- x %>% mutate(founddate = as.Date(NA))
    
      for(i in 1:nrow(x)){
        y <- x[i, "date_down"]
        x[i, "founddate"] <-(x[-c(1:i),] %>% filter(code == "p", date_up > y) %>% select(date_up))[1, ]
      }
    
      return(x)
    }
    
    dt %>% group_by(id) %>% do(f(.))
    
    
    # A tibble: 12 x 5
    # Groups:   id [6]
          id code  date_down  date_up    founddate 
                       
     1     1 p     2019-01-01 2019-01-02 NA        
     2     1 f     2019-01-02 2019-01-03 NA        
     3     2 f     2019-01-02 2019-01-02 NA        
     4     2 p     2019-01-03 NA         NA        
     5     3 p     2019-01-04 NA         NA        
     6     4   2019-01-05 2019-01-05 NA        
     7     5 f     2019-01-07 2019-01-08 2019-01-08
     8     5 p     2019-01-07 2019-01-08 2019-01-09
     9     5 p     2019-01-09 2019-01-09 NA        
    10     6 f     2019-01-10 2019-01-10 2019-01-11
    11     6 p     2019-01-10 2019-01-10 2019-01-11
    12     6 p     2019-01-10 2019-01-11 NA 
    

    Your Comment about terrible performance is unsurprising. I would personal message this if I knew how, but below is a Rcpp::cppFunction to do the same thing.

    Rcpp::cppFunction('DataFrame fC(DataFrame x) {
                        int i, j;
                        int n = x.nrows();
                        CharacterVector code = x["code"];
                        DateVector date_up = x["date_up"];
                        DateVector date_down = x["date_down"];
                        DateVector founddate = rep(NA_REAL, n);
    
                        for(i = 0; i < n; i++){
                          for(j = i + 1; j < n; j++){
                            if(code(j) == "p"){
                              if(date_up(j) > date_down(i)){
                                founddate(i) = date_up(j);
                                break;
                              } else{
                                continue;
                              }
                            } else{
                              continue;
                            }
                          }
                        }
                        x.push_back(founddate, "founddate");
                        return x;
                        }')
    
    dt %>% group_by(id) %>% do(fC(.))
    

提交回复
热议问题