This is a seemingly simple R question, but I don\'t see an exact answer here. I have a data frame (alldata) that looks like this:
Case zip market
1
Here's the dplyr
way of doing it:
library(tidyverse)
alldata %>%
select(-market) %>%
left_join(zipcodes, by="zip")
which, on my machine, is roughly the same performance as lookup
.
Since you don't care about the market
column in alldata
, you can first strip it off using and merge the columns in alldata
and zipcodes
based on the zip
column using merge
:
merge(alldata[, c("Case", "zip")], zipcodes, by="zip")
The by
parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield")
.
With such a large data set you may want the speed of an environment lookup. You can use the lookup
function from the qdapTools package as follows:
library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])
Or
alldata$zip %l% zipcodes[, 2:1]
Another option that worked for me and is very simple:
alldata$market<-with(zipcodes, market[match(alldata$zip, zip)])