Get rid of rows with duplicate attributes in R

前端 未结 2 1914
悲哀的现实
悲哀的现实 2021-02-02 17:52

I have a big dataframe with columns such as:

ID, time, OS, IP

Each row of that dataframe corresponds to one entry. Within that dataframe for so

2条回答
  •  余生分开走
    2021-02-02 18:39

    If you want to keep one row for each ID, but there is different data on each row, then you need to decide on some logic to discard the additional rows. For instance:

    df <- data.frame(ID=c(1, 2, 2, 3), time=1:4, OS="Linux")
    df
      ID time    OS
    1  1    1 Linux
    2  2    2 Linux
    3  2    3 Linux
    4  3    4 Linux
    

    Now I will keep the maximum time value and the last OS value:

    library(plyr)
    unique(ddply(df, .(ID), function(x) data.frame(ID=x[,"ID"], time=max(x$time), OS=tail(x$OS,1))))
      ID time    OS
    1  1    1 Linux
    2  2    3 Linux
    4  3    4 Linux
    

提交回复
热议问题