问题
Assume I have a data file called zone
with 1994
strings of 2D
coordinators denoting coordinates of vertices of polygons like the following (the very first number on the RHS of each line denotes the zone
)
c1 <- "1", "1 21, 31 50, 45 65, 75 80"
c2 <- "2", "3 20, 5 15, 2 26, 70 -85, 40 50, 60 80"
.....
c1993 <- "1993", "3 2, 2 -5, 0 60, 7 -58, -12 23, 56 611, 85 152"
c1994 <- "1994", "30 200, 50 -15, 20 260, 700 -850, -1 2, 5 6, 8 15"
Now I want to manipulate these strings in such a way that given a random pair of lat-lon
(let's say 12
and 20
), I could compare to see if it falls into first polygon, second polygon, 3rd polygon,.... or 1994th polygon. The brute-force solution is: compare the x-coordinate
(= 12
) to all the 4
x
-coordinates and y-coordinate
(= 20) to all the
4y
-coordinates in
c1and
c2, respectively. The conclusion would be whether there is a valid **sandwich** inequality for each given coordinate
xand
y`.
For example, by using the solution process as above, the point (12,20)
will be in c1 but not c2.
My question: How could I achieve this goal in R?
My attempt: Thanks to Stéphane Laurent's help, I was able to generate all the matrices, each with certain sizes, that store the lat-lon
pairs of all the vertices of each polygon with the following code:
zone <- read_delim("[directory path to zone.csv file]", delim = ",", col_names = TRUE)
for(i in 1:nrow(zone)){
zone$geo[i] = substr(zone$geo[i],10,135)
}
zone <- zone[complete.cases(zone),]
Numextract <- function(string){
unlist(regmatches(string, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", string)))
}
for(i in 1:nrow(zone)){
poly1 <- matrix(as.numeric(Numextract(zone$geo[i])),i, ncol=2, byrow=TRUE)
poly2 <- cbind(poly1, c(i))
}
However, as you might see, I need to find a way to index every matrices corresponding to each zone that were generated during the for()
loop. The reason is because afterwards, I can use another for()
loop to determine which zone a point belongs to!! But I have not been able to figure this out, so can anyone please help me with a detailed code?
Actual dataset
Zone and polygons dataset
Lat-Lon pairs dataset
回答1:
First, define your polygons as matrices, each row representing a vertex:
poly1 <- rbind(c(1,21), c(31,50), c(45,65), c(75,80))
poly2 <- rbind(c(3,20), c(5,15), c(2,26), c(70,-85))
Define the point to be tested:
point <- c(12,20)
Now, use the pip2d
function of the ptinpoly
package:
> library(ptinpoly)
> pip2d(poly1, rbind(point))
[1] -1
> pip2d(poly2, rbind(point))
[1] 1
That means (see ?pip2d
) that the point is outside poly1
and inside poly2
.
Note the rbind(point)
in pip2d
. We use rbind
because we can more generally run the test for several points in a same polygon.
If you need help to convert
c1 <- "1 21, 31 50, 45 65, 75 80"
to
poly1 <- rbind(c(1,21), c(31,50), c(45,65), c(75,80))
then maybe you should open another question.
Edit
Ok, do not open another question. You can proceed as follows.
c1 <- "1 21, 31 50, 45 65, 75 80"
Numextract <- function(string){
unlist(regmatches(string, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", string)))
}
poly1 <- matrix(as.numeric(Numextract(c1)), ncol=2, byrow=TRUE)
Which gives:
> poly1
[,1] [,2]
[1,] 1 21
[2,] 31 50
[3,] 45 65
[4,] 75 80
2nd Edit
For your second problem, your data are too big. The only solution I can see is to split the data into smaller pieces.
But first of all, it seems that the pip2d
function also causes the R session to crash. So use another function: pnt.in.poly
from the package SDMTools
.
Here is a small modification of this function, making it faster by removing useless outputs:
library(SDMTools)
pnt.in.poly2 <- function(pnts, poly.pnts){
if (poly.pnts[1, 1] == poly.pnts[nrow(poly.pnts), 1] &&
poly.pnts[1, 2] == poly.pnts[nrow(poly.pnts), 2]){
poly.pnts = poly.pnts[-1, ]
}
out = .Call("pip", pnts[, 1], pnts[, 2], nrow(pnts), poly.pnts[,1], poly.pnts[, 2], nrow(poly.pnts), PACKAGE = "SDMTools")
return(out)
}
Now, as said before, split lat_lon
in smaller pieces, 1 million length each, (except the last one, smaller):
lat_lon_list <- vector("list", 70)
for(i in 1:69){
lat_lon_list[[i]] = lat_lon[(1+(i-1)*1e6):(i*1e6),]
}
lat_lon_list[[70]] <- lat_lon[69000001:nrow(lat_lon),]
Now, run this code:
library(data.table)
for(i in 1:70){
DT <- data.table(V1 = pnt.in.poly2(lat_lon_list[[i]], polys[[1]]))
for(j in 2:length(polys)){
DT[, (sprintf("V%d",j)):=pnt.in.poly2(lat_lon_list[[i]], polys[[j]])]
}
fwrite(DT, sprintf("results%02d.csv", i))
rm(DT)
}
If it works, it should generate 70 csv files, result01.csv
, ..., result70.csv
, each of size 1000000x1944
(except the last one, smaller), then it's possible to open them in Excel.
3rd edit
I've tried the code and I've got an error: Error: cannot allocate vector of size 7.6 Mb
.
We need a finer splitting:
lat_lon_list <- vector("list", 2*69+1)
for(i in 1:(2*69)){
lat_lon_list[[i]] = lat_lon[(1+(i-1)*1e6/2):(i*1e6/2),]
}
lat_lon_list[[2*69+1]] <- lat_lon[69000001:nrow(lat_lon),]
for(i in 1:(2*69+1)){
DT <- data.table(V1 = pnt.in.poly2(lat_lon_list[[i]], polys[[1]]))
for(j in 2:length(polys)){
DT[, (sprintf("V%d",j)):=pnt.in.poly2(lat_lon_list[[i]], polys[[j]])]
}
fwrite(DT, sprintf("results%02d.csv", i))
rm(DT)
}
来源:https://stackoverflow.com/questions/49828692/determine-if-a-given-lat-lon-belong-to-a-polygon