Remove duplicated rows

前端未结

关注

 11  1783

清酒与你 2020-11-22 00:00

I have read a CSV file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th

11条回答

旧巷少年郎 (楼主)

2020-11-22 00:55

This problem can also be solved by selecting first row from each group where the group are the columns based on which we want to select unique values (in the example shared it is just 1st column).

Using base R :

subset(df, ave(V2, V1, FUN = seq_along) == 1)

#                      V1  V2 V3     V4 V5
#1 platform_external_dbus 202 16 google  1

In dplyr

library(dplyr)
df %>% group_by(V1) %>% slice(1L)

Or using data.table

library(data.table)
setDT(df)[, .SD[1L], by = V1]

If we need to find out unique rows based on multiple columns just add those column names in grouping part for each of the above answer.

data

df <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L), 
.Label = "platform_external_dbus", class = "factor"), 
V2 = c(202L, 202L, 202L, 202L, 202L), V3 = c(16L, 16L, 16L, 
16L, 16L), V4 = structure(c(1L, 4L, 3L, 5L, 2L), .Label = c("google", 
"hughsie", "localhost", "space-ghost.verbum", "users.sourceforge"
), class = "factor"), V5 = c(1L, 1L, 1L, 8L, 1L)), class = "data.frame", 
row.names = c(NA, -5L))

0 讨论(0)

查看其它11个回答