R: Remove duplicates from a dataframe based on categories in a column

前端 未结 7 1184
耶瑟儿~
耶瑟儿~ 2021-02-15 16:14

Here is my example data set:

      Name Course Cateory
 1: Jason     ML      PT
 2: Jason     ML      DI
 3: Jason     ML      GT
 4: Jason     ML      SY
 5: Ja         


        
7条回答
  •  名媛妹妹
    2021-02-15 16:21

    I may be late, but i believe this is the simplest solution. Since you mentioned 10m rows i propose a data.table implementation using the very understandable unique function

    require("data.table")
    df <- data.table("Name" = c("Jason", "Jason", "Jason", "Jason", "Jason", "Jason", "Nancy", "Nancy", "Nancy", "Nancy", "James", "John"), "Course" = c("ML", "ML", "ML", "ML", "DS", "DS", "ML", "ML", "DS", "DS", "ML", "DS"), "category" = c("PT", "DI", "GT", "SY", "SY", "DI", "PT", "SY", "DI", "GT", "SY", "GT"))
    
    unique(df[, category := factor(category, levels = c("PT","DI","GT","SY"))][order(df$"category")], by = c("Name", "Course"))
    
        Name Course category
    1: Jason     ML       PT
    2: Nancy     ML       PT
    3: Jason     DS       DI
    4: Nancy     DS       DI
    5:  John     DS       GT
    6: James     ML       SY
    

提交回复
热议问题