问题

Assume I have a data file called zone with 1994 strings of 2D coordinators denoting coordinates of vertices of polygons like the following (the very first number on the RHS of each line denotes the zone)

c1 <- "1", "1 21, 31 50, 45 65, 75 80"

c2 <- "2", "3 20, 5 15, 2 26, 70 -85, 40 50, 60 80"

.....

c1993 <- "1993", "3 2, 2 -5, 0 60, 7 -58, -12 23, 56 611, 85 152"

c1994 <- "1994", "30 200, 50 -15, 20 260, 700 -850, -1 2, 5 6, 8 15"

Now I want to manipulate these strings in such a way that given a random pair of lat-lon (let's say 12 and 20), I could compare to see if it falls into first polygon, second polygon, 3rd polygon,.... or 1994th polygon. The brute-force solution is: compare the x-coordinate (= 12) to all the 4 x-coordinates and y-coordinate(= 20) to all the4y-coordinates inc1andc2, respectively. The conclusion would be whether there is a valid **sandwich** inequality for each given coordinatexandy`.

For example, by using the solution process as above, the point (12,20) will be in c1 but not c2.

My question: How could I achieve this goal in R?

My attempt: Thanks to Stéphane Laurent's help, I was able to generate all the matrices, each with certain sizes, that store the lat-lon pairs of all the vertices of each polygon with the following code:

 zone <- read_delim("[directory path to zone.csv file]", delim = ",", col_names = TRUE)
for(i in 1:nrow(zone)){
  zone$geo[i] = substr(zone$geo[i],10,135)
}
zone <- zone[complete.cases(zone),]

 Numextract <- function(string){
    unlist(regmatches(string, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", string)))
 }

for(i in 1:nrow(zone)){
        poly1 <- matrix(as.numeric(Numextract(zone$geo[i])),i, ncol=2, byrow=TRUE)
        poly2 <- cbind(poly1, c(i))
}

However, as you might see, I need to find a way to index every matrices corresponding to each zone that were generated during the for() loop. The reason is because afterwards, I can use another for() loop to determine which zone a point belongs to!! But I have not been able to figure this out, so can anyone please help me with a detailed code?

Actual dataset
Zone and polygons dataset

Lat-Lon pairs dataset

回答1:

First, define your polygons as matrices, each row representing a vertex:

poly1 <- rbind(c(1,21), c(31,50), c(45,65), c(75,80))
poly2 <- rbind(c(3,20), c(5,15), c(2,26), c(70,-85))

Define the point to be tested:

point <- c(12,20)

Now, use the pip2d function of the ptinpoly package:

> library(ptinpoly)
> pip2d(poly1, rbind(point))
[1] -1
> pip2d(poly2, rbind(point))
[1] 1

That means (see ?pip2d) that the point is outside poly1 and inside poly2.

Note the rbind(point) in pip2d. We use rbind because we can more generally run the test for several points in a same polygon.

If you need help to convert

c1 <- "1 21, 31 50, 45 65, 75 80"

poly1 <- rbind(c(1,21), c(31,50), c(45,65), c(75,80))

then maybe you should open another question.

Edit

Ok, do not open another question. You can proceed as follows.

c1 <- "1 21, 31 50, 45 65, 75 80"

Numextract <- function(string){
  unlist(regmatches(string, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", string)))
}

poly1 <- matrix(as.numeric(Numextract(c1)), ncol=2, byrow=TRUE)

Which gives:

> poly1
     [,1] [,2]
[1,]    1   21
[2,]   31   50
[3,]   45   65
[4,]   75   80

2nd Edit

For your second problem, your data are too big. The only solution I can see is to split the data into smaller pieces.

But first of all, it seems that the pip2d function also causes the R session to crash. So use another function: pnt.in.poly from the package SDMTools.

Here is a small modification of this function, making it faster by removing useless outputs:

library(SDMTools)
pnt.in.poly2 <- function(pnts, poly.pnts){
  if (poly.pnts[1, 1] == poly.pnts[nrow(poly.pnts), 1] && 
      poly.pnts[1, 2] == poly.pnts[nrow(poly.pnts), 2]){ 
    poly.pnts = poly.pnts[-1, ]
  }
  out = .Call("pip", pnts[, 1], pnts[, 2], nrow(pnts), poly.pnts[,1], poly.pnts[, 2], nrow(poly.pnts), PACKAGE = "SDMTools")
  return(out)
}

Now, as said before, split lat_lon in smaller pieces, 1 million length each, (except the last one, smaller):

lat_lon_list <- vector("list", 70)
for(i in 1:69){
  lat_lon_list[[i]] = lat_lon[(1+(i-1)*1e6):(i*1e6),]
}
lat_lon_list[[70]] <- lat_lon[69000001:nrow(lat_lon),]

Now, run this code:

library(data.table)
for(i in 1:70){
  DT <- data.table(V1 = pnt.in.poly2(lat_lon_list[[i]], polys[[1]]))
  for(j in 2:length(polys)){
    DT[, (sprintf("V%d",j)):=pnt.in.poly2(lat_lon_list[[i]], polys[[j]])]
  }
  fwrite(DT, sprintf("results%02d.csv", i))
  rm(DT)
}

If it works, it should generate 70 csv files, result01.csv, ..., result70.csv, each of size 1000000x1944 (except the last one, smaller), then it's possible to open them in Excel.

3rd edit

I've tried the code and I've got an error: Error: cannot allocate vector of size 7.6 Mb.

We need a finer splitting:

lat_lon_list <- vector("list", 2*69+1)
for(i in 1:(2*69)){
  lat_lon_list[[i]] = lat_lon[(1+(i-1)*1e6/2):(i*1e6/2),]
}
lat_lon_list[[2*69+1]] <- lat_lon[69000001:nrow(lat_lon),]

for(i in 1:(2*69+1)){
  DT <- data.table(V1 = pnt.in.poly2(lat_lon_list[[i]], polys[[1]]))
  for(j in 2:length(polys)){
    DT[, (sprintf("V%d",j)):=pnt.in.poly2(lat_lon_list[[i]], polys[[j]])]
  }
  fwrite(DT, sprintf("results%02d.csv", i))
  rm(DT)
}

来源：https://stackoverflow.com/questions/49828692/determine-if-a-given-lat-lon-belong-to-a-polygon

标签

point-in-polygon

Determine if a given lat-lon belong to a polygon

问题

回答1:

Edit

2nd Edit

3rd edit