Unique on a dataframe with only selected columns

前端未结

关注

 4  1132

I have a dataframe with >100 columns, and I would to find the unique rows, by comparing only two of the columns. I\'m hoping this is an easy one, but I can\'t get it working

相关标签:

4条回答

悲哀的现实

2020-11-27 13:21
Minor update in @Joran's code.
Using the code below, you can avoid the ambiguity and only get the unique of two columns:
```
dat <- data.frame(id=c(1,1,3), id2=c(1,1,4) ,somevalue=c("x","y","z"))    
dat[row.names(unique(dat[,c("id", "id2")])), c("id", "id2")]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

天涯浪人

2020-11-27 13:24

Using unique():

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))    
dat[row.names(unique(dat[,c("id", "id2")])),]

0 讨论(0)

执念已碎

2020-11-27 13:28
Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:
```
dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
> dat[!duplicated(dat[,c('id','id2')]),]
  id id2 somevalue
1  1   1         x
3  3   4         z
```
Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)
0 讨论(0)
发布评论:

提交评论
- 加载中...

甜味超标

2020-11-27 13:48

Here are a couple dplyr options that keep non-duplicate rows based on columns id and id2:

library(dplyr)                                        
df %>% distinct(id, id2, .keep_all = TRUE)
df %>% group_by(id, id2) %>% filter(row_number() == 1)
df %>% group_by(id, id2) %>% slice(1)

0 讨论(0)