get rows of unique values by group

前端未结

关注

 4  736

I have a data.table and want to pick those lines of the data.table where some values of a variable x are unique relative to another variable y

It\'s possible to get

相关标签:

4条回答

我在风中等你

2021-01-19 02:59

Thanks to dplyR

library(dplyr)
col1 = c(1,1,3,3,5,6,7,8,9)
col2 = c("cust1", 'cust1', 'cust3', 'cust4', 'cust5', 'cust5', 'cust5',     'cust5', 'cust6')
df1 = data.frame(col1, col2)
df1

distinct(select(df1, col1, col2))

0 讨论(0)

感动是毒

2021-01-19 03:00

data.table is a bit different in how to use duplicated. Here's the approach I've seen around here somewhere before:

dt <- data.table(y=rep(letters[1:2],each=3),x=c(1,2,2,3,2,1),z=1:6) 
setkey(dt, "y", "x")
key(dt)
# [1] "y" "x"
!duplicated(dt)
# [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
dt[!duplicated(dt)]
#    y x z
# 1: a 1 1
# 2: a 2 2
# 3: b 1 6
# 4: b 2 5
# 5: b 3 4

0 讨论(0)

滥情空心

2021-01-19 03:07
The simpler data.table solution is to grab the first element of each group
```
> dt[, head(.SD, 1), by=.(y, x)]
   y x z
1: a 1 1
2: a 2 2
3: b 3 4
4: b 2 5
5: b 1 6
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘掉有多难

2021-01-19 03:12

The idiomatic data.table way is:

require(data.table)
unique(dt, by = c("y", "x"))
#    y x z
# 1: a 1 1
# 2: a 2 2
# 3: b 3 4
# 4: b 2 5
# 5: b 1 6

0 讨论(0)