Remove duplicated rows

前端 未结 11 1791
清酒与你
清酒与你 2020-11-22 00:00

I have read a CSV file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th

11条回答
  •  忘掉有多难
    2020-11-22 00:37

    The function distinct() in the dplyr package performs arbitrary duplicate removal, either from specific columns/variables (as in this question) or considering all columns/variables. dplyr is part of the tidyverse.

    Data and package

    library(dplyr)
    dat <- data.frame(a = rep(c(1,2),4), b = rep(LETTERS[1:4],2))
    

    Remove rows duplicated in a specific column (e.g., columna)

    Note that .keep_all = TRUE retains all columns, otherwise only column a would be retained.

    distinct(dat, a, .keep_all = TRUE)
    
      a b
    1 1 A
    2 2 B
    

    Remove rows that are complete duplicates of other rows:

    distinct(dat)
    
      a b
    1 1 A
    2 2 B
    3 1 C
    4 2 D
    

提交回复
热议问题