问题
I am trying to substitute values in a data frame from values in another data frame based on a condition.
Both data contain latitude, longitude and height but one of them is shorter. I want to pick any point from the shorter data frame (5103 rows) , find the closest values on latitude and longitude (by calculating distance) on the second one (188426 rows) and then replace the height value on the longest data frame with the height on the shorter one.
The first data frame is topo.rams in the below code and the second is topo.msg. The final purpose is to substitute height in topo.msg with height values in topo.rams
topo.rams:
longitud,latitud,tempc,u,v,w,relhum,speed,topo
-1.7107, 38.1464, 18.2412, -6.1744, -0.3708, 0.0000, 58.6447, 6.3584,460.5908
-1.7107, 38.1734, 18.5915, -5.7757, -0.3165, 0.0000, 61.8492, 5.9840,416.0403
topo.msg
height,longitud,latitud
448.0, 1.70, 38.14
402.0, 1.70, 38.18
and the desired output (topo.msg modified)
height,longitud,latitud
460.5908, 1.70, 38.14
416.0403, 1.70, 38.18
and the code used
#lectura de datos
topo.msg=read.csv("MSG_DEM.txt",sep=",",header=FALSE)
colnames(topo.msg) <- c("topoMSG","longitud","latitud")
topo.rams=read.csv("topografia-rams.txt",sep=",",header=TRUE)
# número de estaciones a tratar
puntos.rams=dim(topo.rams)[1]
puntos.msg=dim(topo.msg)[1]
# Localización del punto de MSG más próximo a la estación.
# Se calcula la distancia a partir de las coordenadas lat-lon
topo.temp=data.frame()
for(i in 1:puntos.rams)
{
for(j in 1:puntos.msg)
{
dlon<-topo.rams$longitud[i]-topo.msg$longitud
if ( dlon < 0.5 && dlat < 0.5) {
dlat<-topo.rams$latitud[i]-topo.msg$latitud
if ( dlat < 0.5) {
n1<-n1+1
distancia=sqrt(dlon*dlon+dlat*dlat)
}
}
indexj=which.min(distancia)
}
topo.msg$topo[indexj] = topo.rams$topo[i]
}
This code seems to run but it takes a very long time. I have also tried to create a distance matrix with geosphere package from the post in Geographic distance between 2 lists of lat/lon coordinates But R complaints about allocating a 3.6 Gb.
How can I adress this issue? I would like to optimize the loop or to use distance matrix. For sure there has to be a cleaner, more efficient way to calculate distances.
Thanks in advance
回答1:
From the comment by Patric I switched from loop to matrix/vector computation. Now the code is running, simpler and more efficient.
for(i in 1:puntos.rams)
{
dlon<-topo.rams$longitud[i]-topo.msg$longitud
dlat<-topo.rams$latitud[i]-topo.msg$latitud
distancia<-matrix(sqrt(dlon*dlon+dlat*dlat))
indexj=which.min(distancia)
topo.temp$topo[indexj] = topo.rams$topo[i]
}
There's probably a more elegant way to do this calculation. I would appreciate any input.
来源:https://stackoverflow.com/questions/34198665/r-calculate-distance-based-on-latitude-longitude-from-two-data-frames