Remove duplicated rows

前端 未结 11 1740
清酒与你
清酒与你 2020-11-22 00:00

I have read a CSV file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th

11条回答
  •  旧巷少年郎
    2020-11-22 00:55

    This problem can also be solved by selecting first row from each group where the group are the columns based on which we want to select unique values (in the example shared it is just 1st column).

    Using base R :

    subset(df, ave(V2, V1, FUN = seq_along) == 1)
    
    #                      V1  V2 V3     V4 V5
    #1 platform_external_dbus 202 16 google  1
    

    In dplyr

    library(dplyr)
    df %>% group_by(V1) %>% slice(1L)
    

    Or using data.table

    library(data.table)
    setDT(df)[, .SD[1L], by = V1]
    

    If we need to find out unique rows based on multiple columns just add those column names in grouping part for each of the above answer.

    data

    df <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L), 
    .Label = "platform_external_dbus", class = "factor"), 
    V2 = c(202L, 202L, 202L, 202L, 202L), V3 = c(16L, 16L, 16L, 
    16L, 16L), V4 = structure(c(1L, 4L, 3L, 5L, 2L), .Label = c("google", 
    "hughsie", "localhost", "space-ghost.verbum", "users.sourceforge"
    ), class = "factor"), V5 = c(1L, 1L, 1L, 8L, 1L)), class = "data.frame", 
    row.names = c(NA, -5L))
    

提交回复
热议问题