How to output duplicated rows

前端 未结 6 1710
情话喂你
情话喂你 2020-12-04 03:39

I have the following data:

x1  x2  x3  x4
34  14  45  53 
2   8   18  17
34  14  45  20
19  78  21  48 
2   8   18  5

In rows 1 and 3; and

相关标签:
6条回答
  • 2020-12-04 04:17

    There is another way to solve both questions using two packages.

    library(DescTools)
    library(dplyr)
    dat[AllDuplicated(dat[1:3]), ] %>% # this line is to find duplicates
      group_by(x1, x2) %>% # the lines followed are to sum up
      mutate(x4 = sum(x4)) %>%
      unique()
    # Source: local data frame [2 x 4]
    # Groups: x1, x2
    # 
    #   x1 x2 x3 x4
    # 1 34 14 45 73
    # 2  2  8 18 22
    
    0 讨论(0)
  • 2020-12-04 04:23

    You can do this with duplicated, which checks for rows being duplicated when passed a matrix. Since you're only checking the first three columns, you should pass dat[,-4] to the function.

    dat[duplicated(dat[,-4]) | duplicated(dat[,-4], fromLast=T),]
    #   x1 x2 x3 x4
    # 1 34 14 45 53
    # 2  2  8 18 17
    # 3 34 14 45 20
    # 5  2  8 18  5
    
    0 讨论(0)
  • 2020-12-04 04:23

    An alternative using ave:

    dat[ave(dat[,1], dat[-4], FUN=length) > 1,]
    
    #  x1 x2 x3 x4
    #1 34 14 45 53
    #2  2  8 18 17
    #3 34 14 45 20
    #5  2  8 18  5
    
    0 讨论(0)
  • 2020-12-04 04:25

    first one similar as above, let z be your data.frame:

     library(DescTools)
     (zz <- Sort(z[AllDuplicated(z[, -4]), ], decreasing=TRUE) )
    
     # now aggregate
     aggregate(zz[, 4], zz[, -4], FUN=sum)
    
     # use Sort again, if needed...
    
    0 讨论(0)
  • 2020-12-04 04:26

    Learned this one the other day. You won't need to re-order the output.

    s <- split(dat, do.call(paste, dat[-4]))
    Reduce(rbind, Filter(function(x) nrow(x) > 1, s))
    #   x1 x2 x3 x4
    # 2  2  8 18 17
    # 5  2  8 18  5
    # 1 34 14 45 53
    # 3 34 14 45 20
    
    0 讨论(0)
  • 2020-12-04 04:27

    Can also use table command:

    > d1 = ddf[ddf$x1 %in% ddf$x1[which(table(ddf$x1)>1)],]
    > d2 = ddf[ddf$x2 %in% ddf$x2[which(table(ddf$x2)>1)],]
    > rr = rbind(d1, d2)
    > rr[!duplicated(rbind(d1, d2)),]
      x1 x2 x3 x4
    1 34 14 45 53
    3 34 14 45 20
    2  2  8 18 17
    5  2  8 18  5
    

    For sum in last column:

    > rrt = data.table(rr2)
    > rrt[,x4:=sum(x4),by=x1]
    > rrt[rrt[,!duplicated(x1),]]
       x1 x2 x3 x4
    1: 34 14 45 73
    2:  2  8 18 22
    
    0 讨论(0)
提交回复
热议问题