问题
I'm trying to achieve a function that I previously did in Excel, but having trouble finding a means to achieve it.
I have two datasets: one is my base dataset and the other is a lookup table. My base has two columns, the first and last names of people. My lookup table has these first two columns as well, but it also includes a replacement first name.
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert")
)
What I want to do is to replace the Fname with NewFname, dependent upon two conditions: that Fname and Sname match in both dataframes. This is because I have a dataset with other 40,000 rows of data that need to be processed. Ultimately, I'd hope to end up with the following dataframe:
People <- data.frame(
Fname = c("Tommy","Tim","Jerry","Ben","Roderick","John","Perry","Robert"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
However, I want a function solution so I don't have to manually input the conditions and replacement names individually. So far, I have the following (problematic) solution, which would involve generating a new column using mutate in dplyr, but it's not working
People %>%
mutate(NewName = if_else(
Fname == Lookup$Fname & Sname == Lookup$Sname, NewFname, Fname
))
回答1:
Simply use a left_join
and then mutate
on !is.na()
library(dplyr)
People %>%
left_join(Lookup, by = c("Fname", "Sname")) %>%
mutate(Fname = ifelse(!is.na(NewFname), NewFname, Fname))
# Fname Sname NewFname
# 1 Tommy Harper Tommy
# 2 Tim Kingston Tim
# 3 Jerry Ribery <NA>
# 4 Ben Ghazali <NA>
# 5 Roderick Baker Roderick
# 6 John Falcon <NA>
# 7 Perry Jefferson <NA>
# 8 Robert Lombardy Robert
I left NewFname
just to make it clear whats happening.
Data:
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy"), stringsAsFactors = F
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert"), stringsAsFactors = F
)
来源:https://stackoverflow.com/questions/52890271/dplyr-string-match-and-replace-based-on-lookup-table-in-r