问题
I'm an absolute beginner and am hoping someone will be able to help me with a merge problem that I've been stuck on for most of this evening and have thus far been unable to successfully adapt solutions to similar problems to this particular example.
I've made a dummy data frame and vector to help illustrate my problem:
dumdata <- data.frame(id=c(1:5), pcode=c(1234,9876,4477,2734,3999), vlo=c(100,450,1000,1325,1500), vhi=c(300,950,1100,1450,1700))
id pcode vlo vhi
1 1234 100 300
2 9876 450 950
3 4477 1000 1100
4 2734 1325 1450
5 3999 1500 1700
vkey <- c(105,290,513,1399,1572,1683)
I would like to output a new dataframe that contains the data of dumdata in the cases where the value of vkey falls between the variables vlo and vhi. In practice, the value of vkey will always fall between a vlo-vhi range, and the ranges are always discrete.
The desired output would look like the following:
id pcode vlo vhi vkey
1 1234 100 300 105
1 1234 100 300 290
2 9876 450 950 513
4 2734 1325 1450 1399
5 3999 1500 1700 1572
5 3999 1500 1700 1683
回答1:
Rather than using for
loops, you can construct the whole index vector in one go with sapply
.
ind <- sapply(vkey, function(x) which(dumdata$vlo < x & x < dumdata$vhi))
data.frame(dumdata[ind,], vkey)
id pcode vlo vhi vkey
1 1 1234 100 300 105
1.1 1 1234 100 300 290
2 2 9876 450 950 513
4 4 2734 1325 1450 1399
5 5 3999 1500 1700 1572
5.1 5 3999 1500 1700 1683
If any value in vkey
matches multiple lines in dumdata
it gets uglier though, as you'll need to use lapply
instead of sapply and then do
data.frame(dumdata[unlist(ind),], rep(vkey, sapply(vkey, length)))
to return all matches, but I take it from the example that it is not going to happen.
Edit:
For completeness I'll add that you can use mapply
too, but this is mainly intended for the case when you need to make comparisons with more than one variable (like if you had vkey1
and vkey2
that need to fullfill a condition together).
ind <- mapply(function(x, y) which(dumdata$vlo < x & y < dumdata$vhi),
vkey1, vkey2)
回答2:
Using the data.table package.
library(data.table)
# added a blank vkeyvalue column
dumdata <- data.table(
id=c(1:5),
pcode=c(1234,9876,4477,2734,3999),
vlo=c(100,450,1000,1325,1500),
vhi=c(300,950,1100,1450,1700),
vkeyvalue = as.integer(NA)
)
#initialising the final dataset being populated with the same structure as dumdata
finalfiltereddata <- dumdata[0]
vkey <- c(105,290,513,1399,1572,1683)
# looping throug each key
for ( i in vkey)
{
#subsetting dumdata for values which meet the condition vlo < i & vhi > i
filtereddata <- dumdata[vlo < i & vhi > i]
#assigning the filtered data the respective vkeyvalue
filtereddata[, vkeyvalue := as.integer(i)]
#appending to the master data set
finalfiltereddata <- rbind(finalfiltereddata, filtereddata)
}
finalfiltereddata
# id pcode vlo vhi vkeyvalue
# 1: 1 1234 100 300 105
# 2: 1 1234 100 300 290
# 3: 2 9876 450 950 513
# 4: 4 2734 1325 1450 1399
# 5: 5 3999 1500 1700 1572
# 6: 5 3999 1500 1700 1683
回答3:
One option might be to use cut
to create a matching "id" column for your "vkey" variable as follows:
cutBreaks <- sort(unlist(dumdata[c("vlo", "vhi")], use.names = FALSE))
cutLabels <- rep(1:nrow(dumdata), each = 2) * c(1, -1)
new <- data.frame(vals = vkey, id = cut(vkey, breaks = cutBreaks,
labels = cutLabels[-length(cutLabels)]))
new
# vkey id
# 1 105 1
# 2 290 1
# 3 513 2
# 4 1399 4
# 5 1572 5
# 6 1683 5
Once you have that, merge
should work without a problem:
merge(new, dumdata)
# id vkey pcode vlo vhi
# 1 1 105 1234 100 300
# 2 1 290 1234 100 300
# 3 2 513 9876 450 950
# 4 4 1399 2734 1325 1450
# 5 5 1572 3999 1500 1700
# 6 5 1683 3999 1500 1700
来源:https://stackoverflow.com/questions/19119022/merge-data-frame-based-on-vector-key