Assign unique ID to distinct values within Group with dplyr

前端 未结 2 1306
刺人心
刺人心 2021-01-20 13:22

Problem: I need to make a unique ID field for data that has two levels of grouping. In the example code here, it is Emp and Color. The ID needs to

相关标签:
2条回答
  • 2021-01-20 13:46

    We can try

    dat %>% 
       group_by(Emp) %>%
       mutate(temp = match(Color, unique(Color)),
              temp2 = duplicated(Color)+1,
              ID = sprintf("%s.%02d.%03d", Emp, temp, temp2))%>%
       select(-temp, -temp2)  
    #    Emp  Color       ID
    #   <chr>  <chr>    <chr>
    #1     A    Red A.01.001
    #2     A  Green A.02.001
    #3     A  Green A.02.002
    #4     B Orange B.01.001
    #5     B Yellow B.02.001
    #6     C  Brown C.01.001
    
    0 讨论(0)
  • 2021-01-20 14:07

    Using data.table and sprintf:

    library(data.table)
    setDT(dat)[, ID := sprintf('%s.%02d.%03d', 
                               Emp, rleid(Color), rowid(rleid(Color))), 
               by = Emp]
    

    you get:

    > dat
       Emp  Color       ID
    1:   A    Red A.01.001
    2:   A  Green A.02.001
    3:   A  Green A.02.002
    4:   B Orange B.01.001
    5:   B Yellow B.02.001
    6:   C  Brown C.01.001
    

    How this works:

    • You convert dat to a data.table with setDT()
    • Group by Emp.
    • And create the ID-variable with the sprintf-function. With sprintf you paste several vector easily together according to a specified format.
    • The use of := means that the data.table is updated by reference.
    • %s indicates that a string is to be used in the first part (which is Emp). %02d & %03d indicates that a number needs to have two or three digits with a leading zero(s) when needed. The dots in between will taken literally and thus in cluded in the resulting string.

    Adressing the comment of @jsta, if the values in the Color-column are not sequential you can use:

    setDT(dat)[, r := as.integer(factor(Color, levels = unique(Color))), by = Emp
               ][, ID := sprintf('%s.%02d.%03d', 
                                 Emp, r, rowid(r)), 
                 by = Emp][, r:= NULL]
    

    This will also maintain the order in which the Color column is presented. Instead of as.integer(factor(Color, levels = unique(Color))) you can also use match(Color, unique(Color)) as shown by akrun.

    Implementing the above on a bit larger dataset to illustrate:

    dat2 <- rbindlist(list(dat,dat))
    dat2[, r := match(Color, unique(Color)), by = Emp
         ][, ID := sprintf('%s.%02d.%03d', 
                         Emp, r, rowid(r)), 
         by = Emp]
    

    gets you:

    > dat2
        Emp  Color r       ID
     1:   A    Red 1 A.01.001
     2:   A  Green 2 A.02.001
     3:   A  Green 2 A.02.002
     4:   B Orange 1 B.01.001
     5:   B Yellow 2 B.02.001
     6:   C  Brown 1 C.01.001
     7:   A    Red 1 A.01.002
     8:   A  Green 2 A.02.003
     9:   A  Green 2 A.02.004
    10:   B Orange 1 B.01.002
    11:   B Yellow 2 B.02.002
    12:   C  Brown 1 C.01.002
    
    0 讨论(0)
提交回复
热议问题