How can I remove all duplicates so that NONE are left in a data frame?

前端 未结 3 1514
忘了有多久
忘了有多久 2020-11-22 08:31

There is a similar question for PHP, but I\'m working with R and am unable to translate the solution to my problem.

I have this data frame with 10 rows and 50 column

相关标签:
3条回答
  • 2020-11-22 09:10

    This will extract the rows which appear only once (assuming your data frame is named df):

    df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]
    

    How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

    Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

    0 讨论(0)
  • 2020-11-22 09:17

    Try it

    library(dplyr)
    
    DF1 <- data.frame(Part = c(1,2,3,4,5), Age = c(23,34,23,25,24),  B.P = c(87,76,75,75,78))
    
    DF2 <- data.frame(Part =c(3,5), Age = c(23,24), B.P = c(75,78))
    
    DF3 <- rbind(DF1,DF2)
    
    DF3 <- DF3[!(duplicated(DF3) | duplicated(DF3, fromLast = TRUE)), ]
    
    0 讨论(0)
  • 2020-11-22 09:30

    A possibility involving dplyr could be:

    df %>%
     group_by_all() %>%
     filter(n() == 1)
    

    Or:

    df %>%
     group_by_all() %>%
     filter(!any(row_number() > 1))
    
    0 讨论(0)
提交回复
热议问题