Get rid of rows with duplicate attributes in R

前端未结

关注

 2  1918

I have a big dataframe with columns such as:

ID, time, OS, IP

Each row of that dataframe corresponds to one entry. Within that dataframe for so

相关标签:

2条回答

别跟我提以往

2021-02-02 18:30
```
subset(data,!duplicated(data$ID))
```
Should do the trick
0 讨论(0)
发布评论:

提交评论
- 加载中...

余生分开走

2021-02-02 18:39

If you want to keep one row for each ID, but there is different data on each row, then you need to decide on some logic to discard the additional rows. For instance:

df <- data.frame(ID=c(1, 2, 2, 3), time=1:4, OS="Linux")
df
  ID time    OS
1  1    1 Linux
2  2    2 Linux
3  2    3 Linux
4  3    4 Linux

Now I will keep the maximum time value and the last OS value:

library(plyr)
unique(ddply(df, .(ID), function(x) data.frame(ID=x[,"ID"], time=max(x$time), OS=tail(x$OS,1))))
  ID time    OS
1  1    1 Linux
2  2    3 Linux
4  3    4 Linux

0 讨论(0)