how do I search for columns with same name, add the column values and replace these columns with same name by their sum? Using R

后端 未结 4 636
情话喂你
情话喂你 2021-01-05 20:56

I have a data frame where some consecutive columns have the same name. I need to search for these, add their values in for each row, drop one column and replace the other wi

相关标签:
4条回答
  • 2021-01-05 21:00
    > dfrm <- data.frame(a = 1:10, b= 1:10, cc= 1:10, dd=1:10, ee=1:10)
    > names(dfrm) <- c("a", "a", "b", "b", "b")
    > sapply(unique(names(dfrm)[duplicated(names(dfrm))]), 
          function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
           a  b
     [1,]  2  3
     [2,]  4  6
     [3,]  6  9
     [4,]  8 12
     [5,] 10 15
     [6,] 12 18
     [7,] 14 21
     [8,] 16 24
     [9,] 18 27
    [10,] 20 30
    

    EDIT 2: Using rowSums allows simplification of the first sapply argumentto just unique(names(dfrm)) at the expense of needing to remember to include drop=FALSE in "[":

    sapply(unique(names(dfrm)), 
           function(x) rowSums( dfrm[ , grep(x, names(dfrm)), drop=FALSE]) )
    

    To deal with NA's:

    sapply(unique(names(dfrm)), 
          function(x) apply(dfrm[grep(x, names(dfrm))], 1, 
                  function(y) if ( all(is.na(y)) ) {NA} else { sum(y, na.rm=TRUE) }
           )               )
    

    (Edit note: addressed Tommy counter-example by putting unique around the names(.)[.] construction. The erroneous code was:

    sapply(names(dfrm)[unique(duplicated(names(dfrm)))], 
         function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
    
    0 讨论(0)
  • 2021-01-05 21:16

    Some sample data.

    dfr <- data.frame(
      foo = rnorm(20),
      bar = 1:20,
      bar = runif(20),
      check.names = FALSE
    )
    

    Method: Loop over unique column names; if there is only one of that name, then selecting all columns with that nme will return a vector, but if there are duplicates it will also be a data frame. Use rowSums to sum over rows. (Duh. EDIT: Not quite as 'duh' as previously thought!) lapply returns a list, which we need to reform into a data frame, and finally we fix the names. EDIT: sapply avoids the need for the last step.

    unique_col_names <- unique(colnames(dfr))
    new_dfr <- sapply(unique_col_names, function(name)
    {
      subs <- dfr[, colnames(dfr) == name]
      if(is.data.frame(subs))
        rowSums(subs)
      else
        subs
    })
    
    0 讨论(0)
  • 2021-01-05 21:27

    Here is my one liner

    # transpose data frame, sum by group = rowname, transpose back.
    t(rowsum(t(dfrm), group = rownames(t(dfrm))))
    
    0 讨论(0)
  • 2021-01-05 21:27

    One way is to identify duplcates using (surprise) the duplicated function, and then loop through them to calculate the sums. Here is an example:

    dat.dup <- data.frame(x=1:10, x=1:10, x=1:10, y=1:10, y=1:10, z=1:10, check.names=FALSE)
    dups <- unique(names(dat.dup)[duplicated(names(dat.dup))])
    for (i in dups) {
    dat.dup[[i]] <- rowSums(dat.dup[names(dat.dup) == i])
    }
    dat <- dat.dup[!duplicated(names(dat.dup))]
    
    0 讨论(0)
提交回复
热议问题