R: Remove duplicates from a dataframe based on categories in a column

前端 未结 7 1182
耶瑟儿~
耶瑟儿~ 2021-02-15 16:14

Here is my example data set:

      Name Course Cateory
 1: Jason     ML      PT
 2: Jason     ML      DI
 3: Jason     ML      GT
 4: Jason     ML      SY
 5: Ja         


        
7条回答
  •  名媛妹妹
    2021-02-15 16:18

    Here is a snippet that does what you asked:

    df$Category <- factor(df$Category, levels = c("PT", "DI", "GT", "SY"))
    
    df <- df[order(df$Category),]
    
    df[!duplicated(df[,c('Name', 'Course')]),]
    

    Output:

    Name Course Category
    Jason     ML       PT
    Nancy     ML       PT
    Jason     DS       DI
    Nancy     DS       DI
    John      DS       GT
    James     ML       SY
    

    Idea is that we sort based on the priority structure. Then we apply the unique operations, which will return the first match. The return will be what we want.

提交回复
热议问题