R - Spatial Join Between SpatialPoints (GPS coordinates) and SpatialLinesDataFrame

安稳与你 提交于 2020-01-06 06:07:02

问题


I am working on a university project which combines data science and GIS. We need to find an open-source solution capable of obtaining additional information from a massive GPS coordinates dataset. Clearly, I cannot use any API with daily request limit.

THE DATA

Here you can find a sample of the dataset the Professor provided us:

longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)        
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- seq.int(1, 10)

FIRST TASK: Already Accomplished!

The first step was joining my SpatialPoints with a SpatialPolygonsDataFrame using over() of rgeos. The SpatialPolygonsDataFrame was obtained through getData('GADM', country='ITA', level=3) of rgeos.
For this first accomplished task, the objective was to associate to each GPS coordinates the information about City and Region which they belong to.
An example of the result I was able to obtain is:

require(sp)
require(rgeos)
my_spdf <- SpatialPointsDataFrame(coords = longlat, data = ID, proj4string = CRS(" +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 "))
italy_administrative_boundaries_level3 <- getData('GADM', country='ITA', level=3)
result <- over(my_spdf, italy_administrative_boundaries_level3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
result$ID <- ID
print(result)

SECOND TASK: MY QUESTION

Now the stuff become tricky because I need to associate additional and deeper information like road_name and road_type.
This information are contained in the shapefiles created on OpenStreetMap and available at: download.geofabrik.de/europe/italy.html. I loaded the shapefile in R obtaining a SpatialLinesDataFrame:

require(rgdal)
shapefile_roads <- readOGR(dsn = "./road", layer = "roads")

Then, I naively tried to apply the same technique as for joining SpatialPoints and SpatialPolygonsDataFrame:

result <- over(my_spdf, shapefile_roads)

Clearly, the result is just NA. One possible reason that came into my mind was that the coordinates of my_df are not in the exact position of the Lines in shapefile_roads, therefore, I should need some kind of radius parameter. However, I am not really sure.

Can you suggest me the correct approach to perform this spatial join between my SpatialPoints and the attributes of the SpatialLinesDataFrame obtained from the road_shapefile of OpenStreetMap?

Please if something is not very clear do not hesitate to ask.


回答1:


Your example data

library(raster)
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)        
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- data.frame(ID=1:5)
ita_gadm3 <- getData('GADM', country='ITA', level=3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
 #use `sp::over` or `raster::extract`
 result <- extract(ita_gadm3, longlat)

Some roads:

road <- spLines(cbind(longitude+.1, latitude), cbind(longitude-.1, rev(latitude)), cbind(longitude-.1, latitude+1), crs=crs(ita_gadm3))

Now find the nearest road segment. You can use geosphere::dist2Line because you are using angular (lon/lat) coordinates.

library(geosphere)
geosphere::dist2Line(longlat, road)
#     distance      lon      lat ID
#[1,] 2498.825 10.83212 44.53355  2
#[2,] 5527.646 11.03032 44.63470  1
#[3,] 5524.227 10.86062 44.63634  2
#[4,] 5577.372 10.86062 44.63634  2
#[5,] 5756.113 10.86062 44.63634  2

Note the variable ID which refers back to the roads. The problem is that dist2line is currently slow and you have a large data set.

The alternative is to transform your spatial data to a planar coordinate system appropriate for Italy and use gDistance.

library(rgeos)
library(rgeos)
sp <- SpatialPoints(longlat, proj4string=crs(ita_gadm3))
spita <- spTransform(sp, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
rdita <- spTransform(road, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")

gd <- gDistance(rdita, spita, byid=TRUE)
a <- apply(gd, 1, which.min)
a
#1 2 3 4 5 
#2 1 2 2 2 

That is, point 2 is closest to road 1. The other points are closest to road 2. You probably need to do that in batches of points or tiles to avoid getting a distance matrix that is too large.

The buffer solution suggested by Sébastien could work in principle, but gets really complicated as there is no good buffer size. At the one hand, points may be outside any buffer and, at the other hand, they may overlap with several buffers. If you use buffers, sp::over returns an arbitrary match if there are multiple matches, whereas raster::extract will return them all. Neither is pretty, and I would avoid this approach. Illustrated here:

b <- buffer(road, width=.15, dissolve=F)
plot(b)
lines(road, col='red', lwd=2)
points(longlat, pch=20, col='blue')

extract(b, longlat)
#   point.ID poly.ID
#1         1       1
#2         1       2
#3         2       2
#4         2       1
#5         3       2
#6         3       1
#7         4       2
#8         4       1
#9         5       1
#10        5       2

over(sp, b)
#1 2 3 4 5 
#2 2 2 2 2 



回答2:


You need to join polygons with your points, not Lines. To do so, you can create a buffer area around your Lines using rgeos::gBuffer(). Be careful, because the buffer area will be in the coordinate system of your Lines. Probably degrees (wgs84) in your case (Verify it). Choose the correct distance (width) according to your case.

LinesBuffer <- rgeos::gBuffer(shapefile_roads, width = 0.01)

Then you will be able to join points with "LinesBuffer" using over (if they are in the same coordinates system).

result <- over(my_spdf, LinesBuffer)


来源:https://stackoverflow.com/questions/47675571/r-spatial-join-between-spatialpoints-gps-coordinates-and-spatiallinesdatafra

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!