问题
I am a beginner in programming in R. I am at the moment trying to retrieve some site names from a dataframe containing the X and Y coordinates and site names and copy them into a different dataframe with specific points.
FD <- matrix(data =c(rep(1, 500), rep(0, 500),
rnorm(1000, mean = 550000, sd=4000),
rnorm(1000, mean = 6350000, sd=20000), rep(NA, 1000)),
ncol = 4, nrow = 1000, byrow = FALSE)
colnames(FD) <- c('Survival', 'X', 'Y', 'Site')
FD <- as.data.frame(FD)
shpxt <- matrix(c(526654.7,526810.5 ,6309098,6309187,530405.4,530692,
6337699, 6338056,580432.7, 580541.9, 6380246,6380391,
585761.3, 585847.6, 6379665, 6379759, 584192.1, 584279.4,
6382358, 6382710, 583421.2, 583492.4, 6379356, 6379425,
532395.5, 532515.3 , 6336421, 6336587, 534694.6, 534791.2,
6335620, 6335740, 536749.8, 536957.5, 6337584, 6338130, 590049.6,
590419.4, 6372232, 6372432, 580443, 580756.5, 6386342, 6386473,
575263.9, 575413.7, 6380416, 6380530, 584625.1, 584753.9, 6381009,
6381335), ncol = 4, nrow = 13, byrow = TRUE)
sites <- c("Brandbaeltet", "Brusaa", "Granly", "Jerup Strand", "Knasborgvej",
"Milrimvej", "Overklitten", "Oversigtsareal", "Sandmosen",
"Strandby", "Troldkaer", "Vaagholt", "Videsletengen")
colnames(shpxt) <- c("Xmin", "Xmax", "Ymin", "Ymax")
shpxt <- as.data.frame(shpxt)
shpxt["Sites"] <- sites
My approach is using a nested for loop like this:
tester <- function(FD, shpxt)
{ for (i in 1:nrow(FD)) for (j in 1:nrow(shpxt)) # Open Function
{ if (FD[i,2] >= shpxt[j,1] | FD[i,2] <= shpxt[j,2] & # Open Loop
FD[i,3] >= shpxt[j,3] | FD[i,3] <= shpxt[j,4])
{ # Open Consequent
FD[i,4]=shpxt[j,5]
{break}
} else # Close Consequent
{FD[i,4] <- NA # Open alternative
} # Close alternative
} # Close loop
} # Close function
tester(FD, shpxt)
In essence I want to search for which site the X and Y coordinates in FD fall into range and copy the sitename into FD$Site in row i. When I run the loop on my real data I get the following error message:
test(FD, shpxt)
Error in if (FD[i, 2] >= shpxt[j, 1] | FD[i, 2] <= shpxt[j, 2] & FD[i, :
missing value where TRUE/FALSE needed
How do I get the loop to go from here to where the loop will be copying the desired sitename into my FD?
Kind Regards Thøger
回答1:
You want to merge two data frames considering a range match between key columns. Here are two solutions.
using sqldf
library(sqldf)
output <- sqldf("select * from FD left join shpxt
on (FD.X >= shpxt.Xmin and FD.X <= shpxt.Xmax and
FD.Y >= shpxt.Ymin and FD.Y <= shpxt.Ymax ) ")
using data.table
library(data.table)
# convert your datasets in data.table
setDT(FD)
setDT(shpxt)
output <- FD[shpxt, on = .(X >= Xmin , X <= Xmax, # indicate x range
Y >= Ymin , Y <= Ymax), nomatch = NA, # indicate y range
.(Survival, X, Y, Xmin, Xmax, Ymin, Ymax, Sites )] # indicate columns in the output
There are different alternatives to solve this problem, as you will find it in other SO questions here and here.
ps. Keep in mind that for loop
is not necessarily the best solution.
回答2:
Here's a failed attempt in base R -- perhaps someone can help correct
getSite <- function(x, y) {
return (shpxt[x >= shpxt['Xmin'] & x <= shpxt['Xmax'] &
y >= shpxt['Ymin'] & y <= shpxt['Ymax'] , "Sites"])
}
test it
p <- c(Survival=0, X=shpxt[2,1], Y=shpxt[2,3])
getSite(p[['X']],p[['Y']])
returns correctly with
[1] "Brusaa"
However
FD$Site<-apply(FD, 1, function(point) {getSite(point[['X']], point[['Y']])})
fails with
Error in ``$<-.data.frame(
tmp`, "Site", value = character(0)) :
replacement has 0 rows, data has 1000
来源:https://stackoverflow.com/questions/37158839/merge-two-data-frames-considering-a-range-match-between-key-columns