Using one data.frame to update another

后端未结

关注

 7  420

Given 2 data frames that are identical in terms of column names/datatypes, where some columns uniquely identify the rows, is there an efficient function/method for one data.

相关标签:

7条回答

广开言路

2020-12-20 12:35
Using base R, you can use the function replace.df() below, which is loosely based on the source code of merge.data.frame(). Contrary to some other solutions, this one allows multiple columns for identification. I use it rather often in my job. Feel free to copy and use.

This function controls for cases where rows in y are not found in x. Mind that the function does not check whether the combinations are unique. match() will only replace the first occurence by the first occurence of a combination.

The function is used as follows :
```
> replace.df(original, replacement,by=c('Name','Id'))
  Name Id Value1 Value2
1  joe  1    1.2     NA
2 john  2    2.2    9.2
```
Note that this effectively detects the writing error you have in your original code. replacement contains a variabe named 'value2' (small v) instead of Value2 (capital V). After correcting this, the result becomes:
```
> replace.df(original, replacement,by=c('Name','Id'))
  Name Id Value1 Value2
1  joe  1    1.2     NA
2 john  2    2.2    5.9
```
You can use that function as well for changing the values in only some of the columns
```
> replace.df(original, replacement,by=c('Name','Id'),cols='Value2')
  Name Id Value1 Value2
1  joe  1    1.2     NA
2 john  2     NA    5.9
```
The function:
```
replace.df <- function(x,y,by,cols=NULL
           ){
    nx <- nrow(x)
    ny <- nrow(y)

    bx <- x[,by,drop=FALSE]
    by <- y[,by,drop=FALSE]
    bz <- do.call("paste", c(rbind(bx, by), sep = "\r"))

    bx <- bz[seq_len(nx)]
    by <- bz[nx + seq_len(ny)]

    idx <- match(by,bx)
    idy <- match(bx,by)
    idy <- idy[!is.na(idy)]

    if(is.null(cols)) {
      cols <- intersect(names(x),names(y))
      cols <- cols[!cols %in% by]
    }

    x[idx,cols] <- y[idy,cols]
    x
  }
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦毁少年i

2020-12-20 12:41

I produced a function that uses the method of indexing (see answer by John Colby above). Hopefully it can be useful for all such needs of updating one data frame with the values from another data frame.

update.df.with.df <- function(original, replacement, key, value) 
{
    ## PURPOSE: Update a data frame with the values in another data frame
    ## ----------------------------------------------------------------------
    ## ARGUMENT:
    ##   original: a data frame to update,
    ##   replacement: a data frame that has the updated values,
    ##   key: a character vector of variable names to form the unique key
    ##   value: a character vector of variable names to form the values that need to be updated
    ## ----------------------------------------------------------------------
    ## RETURN: The updated data frame from the old data frame "original". 
    ## ----------------------------------------------------------------------
    ## AUTHOR: Feiming Chen,  Date:  2 Dec 2015, 15:08

    n1 <- rownames(original) <- apply(original[, key, drop=F], 1, paste, collapse=".")
    n2 <- rownames(replacement) <- apply(replacement[, key, drop=F], 1, paste, collapse=".")

    n3 <- merge(data.frame(n=n1), data.frame(n=n2))[[1]] # make common keys
    n4 <- levels(n3)[n3]                # convert factor to character

    original[n4, value] <- replacement[n4, value] # update values on the common keys
    original
}
if (F) {                                # Unit Test 
    original <- data.frame(x=c(1, 2, 3), y=c(10, 20, 30))
    replacement <- data.frame(x=2, y=25)
    update.df.with.df(original, replacement, key="x", value="y") # data.frame(x=c(1, 2, 3), y=c(10, 25, 30))

    original <- data.frame(x=c(1, 2, 3), w=c("a", "b", "c"), y=c(10, 20, 30))
    replacement <- data.frame(x=2, w="b", y=25)
    update.df.with.df(original, replacement, key=c("x", "w"), value="y") # data.frame(x=c(1, 2, 3), w=c("a", "b", "c"), y=c(10, 25, 30))

    original = data.frame(Name = c("joe","john") , Id = c( 1 , 2) , Value1 = c(1.2,NA), Value2 = c(NA,9.2))
    replacement = data.frame(Name = c("john") , Id = 2 , Value1 = 2.2 , Value2 = 5.9)
    update.df.with.df(original, replacement, key="Id", value=c("Value1", "Value2"))
    ## goal = data.frame( Name = c("joe","john") , Id = c( 1 , 2) , Value1 = c(1.2,2.2), Value2 = c(NA,5.9) )
}

0 讨论(0)

梦谈多话

2020-12-20 12:43

# limit replacement to elements that have a correspondence in original 
existing = replacement[is.element(replacement$Id, original$Id),]
# replace original at positions where IDs from existing match   
original[match(existing$Id,original$Id),]=existing

0 讨论(0)

一整个雨季

2020-12-20 12:48

require(plyr)
indexes_to_replace <- rownames(match_df(original,replacement,on='Id'))
indexes_from_replace<-rownames(match_df(replacement,original,on='Id'))
original[indexes_to_replace,] <- replacement[indexes_from_replace,]

param on of function match_df can take vectors as well.

0 讨论(0)

囚心锁ツ

2020-12-20 12:50
Just set a unique ID as the row names. Then it is simple indexing:
```
rownames(original) = original$Id
rownames(replacement) = replacement$Id

original[rownames(replacement), ] = replacement
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

鱼传尺愫

2020-12-20 12:52

Here is an approach using the digest package.

library(digest)
# generate keys for each row using the md5 checksum based on first two columns
check1 <- apply(original[,1:2], 1, digest)
check2 <- apply(replacement[,1:2], 1, digest)

# set goal to original and replace rows in replacement
goal <- original
goal[check1 %in% check2,] <- replacement

0 讨论(0)

1 2 下一页