Combining duplicated rows in R and adding new column containing IDs of duplicates

后端 未结 1 873
旧巷少年郎
旧巷少年郎 2020-12-01 19:35

I have a data frame that looks like this:

Chr start   stop    ref alt Hom/het ID  
chr1    5179574 5183384 ref Del Het 719  
chr1    5179574 5184738 ref Del          


        
相关标签:
1条回答
  • 2020-12-01 20:14

    Some call to aggregate() should do the trick.

    Here's an option that collects the ID's in a list object:

    (df1 <- aggregate(df[7], df[-7], unique))
    #   Chr   start    stop ref alt Hom.het       ID
    # 1 chr1 5179574 5183384 ref Del     Het      719
    # 2 chr1 5179574 5184738 ref Del     Het 915, 951
    # 3 chr1 5336806 5358384 ref Del     Het      376
    # 4 chr1 5347979 5358384 ref Del     Het      228
    

    And here's one that collects them in a character vector:

    df2 <- aggregate(df[7], df[-7], 
                     FUN = function(X) paste(unique(X), collapse=", "))
    

    Comparing the results of the two options:

    str(df1$ID)
    # List of 4
    #  $ 0: int 719
    #  $ 3: int [1:2] 915 951
    #  $ 7: int 376
    #  $ 8: int 228
    
    str(df2$ID)
    # chr [1:4] "719" "915, 951" "376" "228"
    
    0 讨论(0)
提交回复
热议问题