Given 2 data frames that are identical in terms of column names/datatypes, where some columns uniquely identify the rows, is there an efficient function/method for one data.
Using base R, you can use the function replace.df()
below, which is loosely based on the source code of merge.data.frame()
. Contrary to some other solutions, this one allows multiple columns for identification. I use it rather often in my job. Feel free to copy and use.
This function controls for cases where rows in y are not found in x. Mind that the function does not check whether the combinations are unique. match() will only replace the first occurence by the first occurence of a combination.
The function is used as follows :
> replace.df(original, replacement,by=c('Name','Id'))
Name Id Value1 Value2
1 joe 1 1.2 NA
2 john 2 2.2 9.2
Note that this effectively detects the writing error you have in your original code. replacement
contains a variabe named 'value2' (small v) instead of Value2 (capital V). After correcting this, the result becomes:
> replace.df(original, replacement,by=c('Name','Id'))
Name Id Value1 Value2
1 joe 1 1.2 NA
2 john 2 2.2 5.9
You can use that function as well for changing the values in only some of the columns
> replace.df(original, replacement,by=c('Name','Id'),cols='Value2')
Name Id Value1 Value2
1 joe 1 1.2 NA
2 john 2 NA 5.9
The function:
replace.df <- function(x,y,by,cols=NULL
){
nx <- nrow(x)
ny <- nrow(y)
bx <- x[,by,drop=FALSE]
by <- y[,by,drop=FALSE]
bz <- do.call("paste", c(rbind(bx, by), sep = "\r"))
bx <- bz[seq_len(nx)]
by <- bz[nx + seq_len(ny)]
idx <- match(by,bx)
idy <- match(bx,by)
idy <- idy[!is.na(idy)]
if(is.null(cols)) {
cols <- intersect(names(x),names(y))
cols <- cols[!cols %in% by]
}
x[idx,cols] <- y[idy,cols]
x
}
I produced a function that uses the method of indexing (see answer by John Colby above). Hopefully it can be useful for all such needs of updating one data frame with the values from another data frame.
update.df.with.df <- function(original, replacement, key, value)
{
## PURPOSE: Update a data frame with the values in another data frame
## ----------------------------------------------------------------------
## ARGUMENT:
## original: a data frame to update,
## replacement: a data frame that has the updated values,
## key: a character vector of variable names to form the unique key
## value: a character vector of variable names to form the values that need to be updated
## ----------------------------------------------------------------------
## RETURN: The updated data frame from the old data frame "original".
## ----------------------------------------------------------------------
## AUTHOR: Feiming Chen, Date: 2 Dec 2015, 15:08
n1 <- rownames(original) <- apply(original[, key, drop=F], 1, paste, collapse=".")
n2 <- rownames(replacement) <- apply(replacement[, key, drop=F], 1, paste, collapse=".")
n3 <- merge(data.frame(n=n1), data.frame(n=n2))[[1]] # make common keys
n4 <- levels(n3)[n3] # convert factor to character
original[n4, value] <- replacement[n4, value] # update values on the common keys
original
}
if (F) { # Unit Test
original <- data.frame(x=c(1, 2, 3), y=c(10, 20, 30))
replacement <- data.frame(x=2, y=25)
update.df.with.df(original, replacement, key="x", value="y") # data.frame(x=c(1, 2, 3), y=c(10, 25, 30))
original <- data.frame(x=c(1, 2, 3), w=c("a", "b", "c"), y=c(10, 20, 30))
replacement <- data.frame(x=2, w="b", y=25)
update.df.with.df(original, replacement, key=c("x", "w"), value="y") # data.frame(x=c(1, 2, 3), w=c("a", "b", "c"), y=c(10, 25, 30))
original = data.frame(Name = c("joe","john") , Id = c( 1 , 2) , Value1 = c(1.2,NA), Value2 = c(NA,9.2))
replacement = data.frame(Name = c("john") , Id = 2 , Value1 = 2.2 , Value2 = 5.9)
update.df.with.df(original, replacement, key="Id", value=c("Value1", "Value2"))
## goal = data.frame( Name = c("joe","john") , Id = c( 1 , 2) , Value1 = c(1.2,2.2), Value2 = c(NA,5.9) )
}
# limit replacement to elements that have a correspondence in original
existing = replacement[is.element(replacement$Id, original$Id),]
# replace original at positions where IDs from existing match
original[match(existing$Id,original$Id),]=existing
require(plyr)
indexes_to_replace <- rownames(match_df(original,replacement,on='Id'))
indexes_from_replace<-rownames(match_df(replacement,original,on='Id'))
original[indexes_to_replace,] <- replacement[indexes_from_replace,]
param on
of function match_df
can take vectors as well.
Just set a unique ID as the row names. Then it is simple indexing:
rownames(original) = original$Id
rownames(replacement) = replacement$Id
original[rownames(replacement), ] = replacement
Here is an approach using the digest
package.
library(digest)
# generate keys for each row using the md5 checksum based on first two columns
check1 <- apply(original[,1:2], 1, digest)
check2 <- apply(replacement[,1:2], 1, digest)
# set goal to original and replace rows in replacement
goal <- original
goal[check1 %in% check2,] <- replacement