How to create id variable by grouping sequenced numbers?

后端 未结 4 1305
盖世英雄少女心
盖世英雄少女心 2021-01-28 18:01

I want to add ID variable in this data. If receipt_ids are sequenced numbers, then those have same IDs.

CUST_NO_ID  receipt_id      dollar
  12         29               


        
相关标签:
4条回答
  • 2021-01-28 18:34

    Assuming your data frame has already been sorted by CUST_NO_ID and receipt_id, you can use cumsum on a conditional vector where TRUE indicates the position a new ID should be created:

    df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))
    
    df
    #   CUST_NO_ID receipt_id dollar ID
    #1          12         29  20.84  1
    #2          12         30  20.21  1
    #3          12         86  24.50  2
    #4          12         87  20.68  2
    #5          12        108  25.79  3
    #6          12        109  24.93  3
    #7          12        125  20.63  4
    #8          12        126   9.90  4
    #9          19        193  69.48  5
    #10         19        194  46.88  5
    
    0 讨论(0)
  • 2021-01-28 18:38

    Had a similar notion to @Psidom, but he beat me to the punch with cumsum. Here's a dplyr solution. Adding in group_by can give you added flexibility if you want to restart ids by customer number.

    df %>% 
      mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))
    
    0 讨论(0)
  • 2021-01-28 18:42

    We can use data.table

    library(data.table)
    setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]
    

    Or use the shift

    setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]
    
    0 讨论(0)
  • 2021-01-28 18:47

    This does it

    id <- 1
    
    for(row in 1:nrow(data)){
      if(row == 1){
        dif <- 1
      }else{
        dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
      }
    
      if(dif != 1){
        id = id + 1
      }
    
      data[row,'ID'] = id
    }
    
    0 讨论(0)
提交回复
热议问题