Remove duplicates column combinations from a dataframe in R

前端 未结 4 1007
夕颜
夕颜 2020-12-06 14:06

I want to remove duplicate combinations of sessionid, qf and qn from the following data

               sessionid             qf        qn         city
1  9         


        
相关标签:
4条回答
  • 2020-12-06 14:48

    To address your sorting problems, first reading in your example data:

    dat <- read.table(text = "               sessionid             qf        qn         city
    1  9cf571c8faa67cad2aa9ff41f3a26e38     cat   biddix          fresno
    2  e30f853d4e54604fd62858badb68113a   caleb     amos             NA   
    3  2ad41134cc285bcc06892fd68a471cd7  daniel  folkers             NA   
    4  2ad41134cc285bcc06892fd68a471cd7  daniel  folkers             NA   
    5  63a5e839510a647c1ff3b8aed684c2a5 charles   pierce           flint
    6  691df47f2df12f14f000f9a17d1cc40e       j    franz prescott+valley
    7  691df47f2df12f14f000f9a17d1cc40e       j    franz prescott+valley
    8  b3a1476aa37ae4b799495256324a8d3d  carrie mascorro            brea
    9  bd9f1404b313415e7e7b8769376d2705    fred  morales       las+vegas
    10 b50a610292803dc302f24ae507ea853a  aurora      lee              NA  
    11 fb74940e6feb0dc61a1b4d09fcbbcb37  andrew    price       yorkville ",sep = "",header = TRUE)
    

    and then you can use arrange from plyr,

    arrange(dat,sessionid,qf,qn)
    

    or using base functions,

    with(dat,dat[order(sessionid,qf,qn),])
    
    0 讨论(0)
  • 2020-12-06 14:48

    It works if you use duplicated twice:

    > df
    
      a  b c    d
    1 1  2 A 1001
    2 2  4 B 1002
    3 3  6 B 1002
    4 4  8 C 1003
    5 5 10 D 1004
    6 6 12 D 1004
    7 7 13 E 1005
    8 8 14 E 1006
    
    > df[!(duplicated(df[c("c","d")]) | duplicated(df[c("c","d")], fromLast = TRUE)), ]
    
    a  b c    d
    1 1  2 A 1001
    4 4  8 C 1003
    7 7 13 E 1005
    8 8 14 E 1006
    
    0 讨论(0)
  • 2020-12-06 14:53

    duplicated() has a method for data.frames, which is designed for just this sort of task:

    df <- data.frame(a = c(1:4, 1:4), 
                     b = c(4:1, 4:1), 
                     d = LETTERS[1:8])
    
    df[!duplicated(df[c("a", "b")]),]
    #   a b d
    # 1 1 4 A
    # 2 2 3 B
    # 3 3 2 C
    # 4 4 1 D
    
    0 讨论(0)
  • 2020-12-06 15:06

    In your example the repeated rows were entirely repeated. unique works with data.frames.

    udf <- unique( my.data.frame )
    

    As for sorting... joran just posted the answer.

    0 讨论(0)
提交回复
热议问题