Unique on a dataframe with only selected columns

前端 未结 4 1132
逝去的感伤
逝去的感伤 2020-11-27 13:13

I have a dataframe with >100 columns, and I would to find the unique rows, by comparing only two of the columns. I\'m hoping this is an easy one, but I can\'t get it working

相关标签:
4条回答
  • 2020-11-27 13:21

    Minor update in @Joran's code.
    Using the code below, you can avoid the ambiguity and only get the unique of two columns:

    dat <- data.frame(id=c(1,1,3), id2=c(1,1,4) ,somevalue=c("x","y","z"))    
    dat[row.names(unique(dat[,c("id", "id2")])), c("id", "id2")]
    
    0 讨论(0)
  • 2020-11-27 13:24

    Using unique():

    dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))    
    dat[row.names(unique(dat[,c("id", "id2")])),]
    
    0 讨论(0)
  • 2020-11-27 13:28

    Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:

    dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
    > dat[!duplicated(dat[,c('id','id2')]),]
      id id2 somevalue
    1  1   1         x
    3  3   4         z
    

    Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

    0 讨论(0)
  • 2020-11-27 13:48

    Here are a couple dplyr options that keep non-duplicate rows based on columns id and id2:

    library(dplyr)                                        
    df %>% distinct(id, id2, .keep_all = TRUE)
    df %>% group_by(id, id2) %>% filter(row_number() == 1)
    df %>% group_by(id, id2) %>% slice(1)
    
    0 讨论(0)
自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题