问题
I have
mydf1 <- data.frame(ID = c(1,2,3,4,5), color = c("red", NA, NA, NA, "green"), name = c("tom", "dick", "harry", "steve", "mike"))
mydf2 <- data.frame(ID = c(1,2,99), color = c("red", "orange", "yellow"), name = c("tom", "dick", "Aaron"))
I would like to update mydf1$color with the corresponding color from mydf2 for any rows that match on both ID and name. The desired output would be to update the color in row 2 to orange and leave the rest as is:
ID color name
1 1 red tom
2 2 orange dick
3 3 <NA> harry
4 4 <NA> steve
5 5 green mike
I tried solutions with asymmetric merging as in some previous posts, but obtained undesired overwriting of some of my fields in mydf1. I then tried using match as suggested in another post but received an error. Not sure why the match condition is not working.
mydf1$color <- mydf2$color[match(mydf1[c("ID", "name")], mydf2[c("ID", "name")])]
回答1:
We can use a join with data.table
on
the 'ID' and 'name' column and update the corresponding value of 'color' from the second dataset in the first dataset by assignment (:=
)
library(data.table)
setDT(mydf1)[mydf2, color := i.color, on = .(ID, name)]
mydf1
# ID color name
#1: 1 red tom
#2: 2 orange dick
#3: 3 <NA> harry
#4: 4 <NA> steve
#5: 5 green mike
match
works on vector/matrix
and not on data.frame
. If we need to use match
, then paste
the 'ID', 'name' from each datasets and do a match
i1 <- match(paste(mydf1$ID, mydf1$name), paste(mydf2$ID, mydf2$name), nomatch = 0)
Or using tidyverse
library(dplyr)
mydf1 %>%
left_join(mydf2, by = c("ID", "name")) %>%
transmute(ID, name, color = coalesce(as.character(color.x),
as.character(color.y)))
# ID name color
#1 1 tom red
#2 2 dick orange
#3 3 harry <NA>
#4 4 steve <NA>
#5 5 mike green
来源:https://stackoverflow.com/questions/60234780/r-update-column-based-on-matching-rows-from-another-data-frame