问题
I have 2 data frames like this
TEAM <- c("PE","PE","MPI","TDT","HPT","ATD")
CODE <- c(NA,"F","A","H","G","D")
df1 <- data.frame(TEAM,CODE)
CODE <- c(NA,"F100","A234","D664","H435","G123","A666","D345","G324",NA)
TEAM <- c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)
df2 <- data.frame(CODE,TEAM)
I am trying to update the TEAM in df2 by matching the first letter in code column in df1 with the code column in df2
My desired output for df2
CODE TEAM
1 NA PE
2 F100 PE
3 A234 MPI
4 D664 ATD
5 H435 TDT
6 G123 HPT
7 A666 MPI
8 D345 ATD
9 G324 HPT
10 NA PE
I am trying this way with sqldf but it is not right
library(sqldf)
df2 <- sqldf(c("update df2 set TEAM =
case
when CODE like '%F%' then 'PE'
when CODE like '%A%' then 'MPI'
when CODE like '%D%' then 'ATD'
when CODE like '%G%' then 'HPT'
when CODE like '%H%' then 'TDT'
else 'NA'
end"))
Can someone help me provide some directions on achieving this without sqldf?
回答1:
Using match
and substr
(both in base R):
df2$TEAM = df1$TEAM[match(substr(df2$CODE, 1, 1), df1$CODE)]
df2
# CODE TEAM
# 1 <NA> PE
# 2 F100 PE
# 3 A234 MPI
# 4 D664 ATD
# 5 H435 TDT
# 6 G123 HPT
# 7 A666 MPI
# 8 D345 ATD
# 9 G324 HPT
# 10 <NA> PE
This is expedient for a single case - if you're doing things like this frequently I would encourage you to just extract the first letter of code into its own column, CODE_1
, and then do a regular merge
or join.
回答2:
Assuming you are looking for an sqldf solution try this:
sqldf("select CODE,
case
when CODE like 'F%' then 'PE'
when CODE like 'A%' then 'MPI'
when CODE like 'D%' then 'ATD'
when CODE like 'G%' then 'HPT'
when CODE like 'H%' then 'TDT'
else 'PE'
end TEAM from df2", method = "raw")
or this:
sqldf("select df2.CODE, coalesce(df1.TEAM, 'PE') TEAM
from df2
left join df1 on substr(df2.CODE, 1, 1) = df1.CODE")
来源:https://stackoverflow.com/questions/39131770/update-a-column-in-df2-by-matching-patterns-in-columns-in-df1-df2-using-r