问题
I am working on a university project which combines data science and GIS. We need to find an open-source solution capable of obtaining additional information from a massive GPS coordinates dataset. Clearly, I cannot use any API with daily request limit.
THE DATA
Here you can find a sample of the dataset the Professor provided us:
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- seq.int(1, 10)
FIRST TASK: Already Accomplished!
The first step was joining my SpatialPoints
with a SpatialPolygonsDataFrame
using over()
of rgeos
. The SpatialPolygonsDataFrame
was obtained through getData('GADM', country='ITA', level=3)
of rgeos
.
For this first accomplished task, the objective was to associate to each GPS coordinates the information about City
and Region
which they belong to.
An example of the result I was able to obtain is:
require(sp)
require(rgeos)
my_spdf <- SpatialPointsDataFrame(coords = longlat, data = ID, proj4string = CRS(" +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 "))
italy_administrative_boundaries_level3 <- getData('GADM', country='ITA', level=3)
result <- over(my_spdf, italy_administrative_boundaries_level3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
result$ID <- ID
print(result)
SECOND TASK: MY QUESTION
Now the stuff become tricky because I need to associate additional and deeper information like road_name
and road_type
.
This information are contained in the shapefiles created on OpenStreetMap and available at: download.geofabrik.de/europe/italy.html.
I loaded the shapefile in R obtaining a SpatialLinesDataFrame
:
require(rgdal)
shapefile_roads <- readOGR(dsn = "./road", layer = "roads")
Then, I naively tried to apply the same technique as for joining SpatialPoints
and SpatialPolygonsDataFrame
:
result <- over(my_spdf, shapefile_roads)
Clearly, the result is just NA
. One possible reason that came into my mind was that the coordinates of my_df
are not in the exact position of the Lines
in shapefile_roads
, therefore, I should need some kind of radius parameter. However, I am not really sure.
Can you suggest me the correct approach to perform this spatial join between my SpatialPoints
and the attributes of the SpatialLinesDataFrame
obtained from the road_shapefile
of OpenStreetMap?
Please if something is not very clear do not hesitate to ask.
回答1:
Your example data
library(raster)
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- data.frame(ID=1:5)
ita_gadm3 <- getData('GADM', country='ITA', level=3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
#use `sp::over` or `raster::extract`
result <- extract(ita_gadm3, longlat)
Some roads:
road <- spLines(cbind(longitude+.1, latitude), cbind(longitude-.1, rev(latitude)), cbind(longitude-.1, latitude+1), crs=crs(ita_gadm3))
Now find the nearest road segment. You can use geosphere::dist2Line
because you are using angular (lon/lat) coordinates.
library(geosphere)
geosphere::dist2Line(longlat, road)
# distance lon lat ID
#[1,] 2498.825 10.83212 44.53355 2
#[2,] 5527.646 11.03032 44.63470 1
#[3,] 5524.227 10.86062 44.63634 2
#[4,] 5577.372 10.86062 44.63634 2
#[5,] 5756.113 10.86062 44.63634 2
Note the variable ID
which refers back to the roads. The problem is that dist2line is currently slow and you have a large data set.
The alternative is to transform your spatial data to a planar coordinate system appropriate for Italy and use gDistance.
library(rgeos)
library(rgeos)
sp <- SpatialPoints(longlat, proj4string=crs(ita_gadm3))
spita <- spTransform(sp, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
rdita <- spTransform(road, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
gd <- gDistance(rdita, spita, byid=TRUE)
a <- apply(gd, 1, which.min)
a
#1 2 3 4 5
#2 1 2 2 2
That is, point 2 is closest to road 1. The other points are closest to road 2. You probably need to do that in batches of points or tiles to avoid getting a distance matrix that is too large.
The buffer solution suggested by Sébastien could work in principle, but gets really complicated as there is no good buffer size. At the one hand, points may be outside any buffer and, at the other hand, they may overlap with several buffers. If you use buffers, sp::over
returns an arbitrary match if there are multiple matches, whereas raster::extract
will return them all. Neither is pretty, and I would avoid this approach. Illustrated here:
b <- buffer(road, width=.15, dissolve=F)
plot(b)
lines(road, col='red', lwd=2)
points(longlat, pch=20, col='blue')
extract(b, longlat)
# point.ID poly.ID
#1 1 1
#2 1 2
#3 2 2
#4 2 1
#5 3 2
#6 3 1
#7 4 2
#8 4 1
#9 5 1
#10 5 2
over(sp, b)
#1 2 3 4 5
#2 2 2 2 2
回答2:
You need to join polygons with your points, not Lines. To do so, you can create a buffer area around your Lines using rgeos::gBuffer()
. Be careful, because the buffer area will be in the coordinate system of your Lines. Probably degrees (wgs84) in your case (Verify it). Choose the correct distance (width
) according to your case.
LinesBuffer <- rgeos::gBuffer(shapefile_roads, width = 0.01)
Then you will be able to join points with "LinesBuffer" using over
(if they are in the same coordinates system).
result <- over(my_spdf, LinesBuffer)
来源:https://stackoverflow.com/questions/47675571/r-spatial-join-between-spatialpoints-gps-coordinates-and-spatiallinesdatafra