Assign unique ID to distinct values within Group with dplyr

前端未结

关注

 2  1306

Problem: I need to make a unique ID field for data that has two levels of grouping. In the example code here, it is Emp and Color. The ID needs to

相关标签:

2条回答

情歌与酒

2021-01-20 13:46

We can try

dat %>% 
   group_by(Emp) %>%
   mutate(temp = match(Color, unique(Color)),
          temp2 = duplicated(Color)+1,
          ID = sprintf("%s.%02d.%03d", Emp, temp, temp2))%>%
   select(-temp, -temp2)  
#    Emp  Color       ID
#   <chr>  <chr>    <chr>
#1     A    Red A.01.001
#2     A  Green A.02.001
#3     A  Green A.02.002
#4     B Orange B.01.001
#5     B Yellow B.02.001
#6     C  Brown C.01.001

0 讨论(0)

别那么骄傲

2021-01-20 14:07

Using data.table and sprintf:

library(data.table)
setDT(dat)[, ID := sprintf('%s.%02d.%03d', 
                           Emp, rleid(Color), rowid(rleid(Color))), 
           by = Emp]

you get:

> dat
   Emp  Color       ID
1:   A    Red A.01.001
2:   A  Green A.02.001
3:   A  Green A.02.002
4:   B Orange B.01.001
5:   B Yellow B.02.001
6:   C  Brown C.01.001

How this works:

You convert dat to a data.table with setDT()
Group by Emp.
And create the ID-variable with the sprintf-function. With sprintf you paste several vector easily together according to a specified format.
The use of := means that the data.table is updated by reference.
%s indicates that a string is to be used in the first part (which is Emp). %02d & %03d indicates that a number needs to have two or three digits with a leading zero(s) when needed. The dots in between will taken literally and thus in cluded in the resulting string.

Adressing the comment of @jsta, if the values in the Color-column are not sequential you can use:

setDT(dat)[, r := as.integer(factor(Color, levels = unique(Color))), by = Emp
           ][, ID := sprintf('%s.%02d.%03d', 
                             Emp, r, rowid(r)), 
             by = Emp][, r:= NULL]

This will also maintain the order in which the Color column is presented. Instead of as.integer(factor(Color, levels = unique(Color))) you can also use match(Color, unique(Color)) as shown by akrun.

Implementing the above on a bit larger dataset to illustrate:

dat2 <- rbindlist(list(dat,dat))
dat2[, r := match(Color, unique(Color)), by = Emp
     ][, ID := sprintf('%s.%02d.%03d', 
                     Emp, r, rowid(r)), 
     by = Emp]

gets you:

> dat2
    Emp  Color r       ID
 1:   A    Red 1 A.01.001
 2:   A  Green 2 A.02.001
 3:   A  Green 2 A.02.002
 4:   B Orange 1 B.01.001
 5:   B Yellow 2 B.02.001
 6:   C  Brown 1 C.01.001
 7:   A    Red 1 A.01.002
 8:   A  Green 2 A.02.003
 9:   A  Green 2 A.02.004
10:   B Orange 1 B.01.002
11:   B Yellow 2 B.02.002
12:   C  Brown 1 C.01.002

0 讨论(0)