get rows of unique values by group

前端 未结 4 736
抹茶落季
抹茶落季 2021-01-19 02:29

I have a data.table and want to pick those lines of the data.table where some values of a variable x are unique relative to another variable y

It\'s possible to get

相关标签:
4条回答
  • 2021-01-19 02:59

    Thanks to dplyR

    library(dplyr)
    col1 = c(1,1,3,3,5,6,7,8,9)
    col2 = c("cust1", 'cust1', 'cust3', 'cust4', 'cust5', 'cust5', 'cust5',     'cust5', 'cust6')
    df1 = data.frame(col1, col2)
    df1
    
    distinct(select(df1, col1, col2))
    
    0 讨论(0)
  • 2021-01-19 03:00

    data.table is a bit different in how to use duplicated. Here's the approach I've seen around here somewhere before:

    dt <- data.table(y=rep(letters[1:2],each=3),x=c(1,2,2,3,2,1),z=1:6) 
    setkey(dt, "y", "x")
    key(dt)
    # [1] "y" "x"
    !duplicated(dt)
    # [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
    dt[!duplicated(dt)]
    #    y x z
    # 1: a 1 1
    # 2: a 2 2
    # 3: b 1 6
    # 4: b 2 5
    # 5: b 3 4
    
    0 讨论(0)
  • 2021-01-19 03:07

    The simpler data.table solution is to grab the first element of each group

    > dt[, head(.SD, 1), by=.(y, x)]
       y x z
    1: a 1 1
    2: a 2 2
    3: b 3 4
    4: b 2 5
    5: b 1 6
    
    0 讨论(0)
  • 2021-01-19 03:12

    The idiomatic data.table way is:

    require(data.table)
    unique(dt, by = c("y", "x"))
    #    y x z
    # 1: a 1 1
    # 2: a 2 2
    # 3: b 3 4
    # 4: b 2 5
    # 5: b 1 6
    
    0 讨论(0)
提交回复
热议问题