Get rid of rows with duplicate attributes in R

前端 未结 2 1913
悲哀的现实
悲哀的现实 2021-02-02 17:52

I have a big dataframe with columns such as:

ID, time, OS, IP

Each row of that dataframe corresponds to one entry. Within that dataframe for so

相关标签:
2条回答
  • 2021-02-02 18:30
    subset(data,!duplicated(data$ID))
    

    Should do the trick

    0 讨论(0)
  • 2021-02-02 18:39

    If you want to keep one row for each ID, but there is different data on each row, then you need to decide on some logic to discard the additional rows. For instance:

    df <- data.frame(ID=c(1, 2, 2, 3), time=1:4, OS="Linux")
    df
      ID time    OS
    1  1    1 Linux
    2  2    2 Linux
    3  2    3 Linux
    4  3    4 Linux
    

    Now I will keep the maximum time value and the last OS value:

    library(plyr)
    unique(ddply(df, .(ID), function(x) data.frame(ID=x[,"ID"], time=max(x$time), OS=tail(x$OS,1))))
      ID time    OS
    1  1    1 Linux
    2  2    3 Linux
    4  3    4 Linux
    
    0 讨论(0)
提交回复
热议问题