Extending Suffixes in Merge to All Non-by Columns

前端 未结 3 1856
后悔当初
后悔当初 2021-01-04 12:06

suffixes in merge works only on common column names. Is there anyway to extend this to the rest of the columns as well without manually updating co

相关标签:
3条回答
  • 2021-01-04 12:29

    This is an interesting question, and I doubt that extending merge would be a straightforward solution unless Matt Dowle and Co. think it's something worth implementing in merge.data.table.

    Here's one approach that came to mind:

    DTs <- c("df1", "df2")
    suffixes <- seq_along(DTs)
    
    for (i in seq_along(DTs)) {
      Name <- setdiff(colnames(get(DTs[i])), "a")
      setnames(get(DTs[i]), Name, paste(Name, suffixes[i], sep = "."))
    }
    
    merge(df1, df2, by = "a") # Will obviously work as you expect now
    
    0 讨论(0)
  • 2021-01-04 12:32

    Try the following:

    colnames(
      mergeWithSuffix(df1,df2, by = 'a', suffixes = c("1","2"))
    )
    [1] "a"   "b.1" "d.1" "d.2"
    

    Notice that the original data.frames are unharmed.

    colnames(df1)
    [1] "a" "b" "d"
    
    colnames(df2)
    [1] "a" "d"
    

    The functions are as follows

    require(data.table)
    
    mergeWithSuffix <- function(x, y, by, suffixes=NULL, ...) {
    
      # Add Suffixes
      mkSuffix(x, suffixes[[1]], merge.col=by)
      mkSuffix(y, suffixes[[2]], merge.col=by)
    
      # Merge
      ret <- merge(x, y, by = by, suffixes = NULL, ...)
    
      # Remove Suffixes
      undoSuffix(x, suffixes[[1]], merge.col=by)
      undoSuffix(y, suffixes[[2]], merge.col=by)
      return(ret)
    }
    
    mkSuffix <- function(x, sfx, sep=".", merge.col=NULL)  {
      nms <- setdiff(names(x), merge.col)
      setnames(x, nms, paste(nms, sfx, sep=".") ) 
    }
    
    undoSuffix <- function(x, sfx, sep=".", merge.col=NULL) {
      nms <- setdiff(names(x), merge.col)
      setnames(x, nms, sub(paste0(get("sep"), sfx, "$"), "", nms))
    }
    

    Notice that setnames works by reference, so the overhead is almost negligible. Also, as discussed elsewhere, this works equally well on data.frames and data.table

    0 讨论(0)
  • 2021-01-04 12:33

    A simple solution:

    mrg<-(merge(df1,df2, by = 'a', suffixes = c("1","2")))
    setnames(mrg,paste0(names(mrg),ifelse(names(mrg) %in% setdiff(names(df1),names(df2)),"1","")))
    setnames(mrg,paste0(names(mrg),ifelse(names(mrg) %in% setdiff(names(df2),names(df1)),"2","")))
    
    > names(mrg)
    [1] "a"  "b1" "d1" "d2"
    

    Edit: thanks to comments by Ricardo Saporta for cleaning this up considerably and teaching me a few new tips!

    0 讨论(0)
提交回复
热议问题