spatial clustering in R (simple example)

前端未结

关注

 3  910

I have this simple data.frame

 lat<-c(1,2,3,10,11,12,20,21,22,23)
 lon<-c(5,6,7,30,31,32,50,51,52,53)
 data=data.frame(lat,lon)

相关标签:

3条回答

醉酒成梦

2021-02-04 15:20
As you have a spatial data to cluster, so DBSCAN is best suited for you data. You can do this clustering using dbscan() function provided by fpc, a R package.
```
library(fpc)

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)

DBSCAN <- dbscan(cbind(lat, lon), eps = 1.5, MinPts = 3)
plot(lon, lat, col = DBSCAN$cluster, pch = 20)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

暗喜

2021-02-04 15:25

What about something like this:

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)

km <- kmeans(cbind(lat, lon), centers = 3)
plot(lon, lat, col = km$cluster, pch = 20)

enter image description here

0 讨论(0)

天命终不由人

2021-02-04 15:31
Here's a different approach. First it assumes that the coordinates are WGS-84 and not UTM (flat). Then it clusters all neighbors within a given radius to the same cluster using hierarchical clustering (with method = single, which adopts a 'friends of friends' clustering strategy).

In order to compute the distance matrix, I'm using the rdist.earth method from the package fields. The default earth radius for this package is 6378.388 (the equatorial radius) which might not be what one is looking for, so I've changed it to 6371. See this article for more info.
```
library(fields)
lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284)
lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378)
threshold.in.km <- 40
coors <- data.frame(lon,lat)

#distance matrix
dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371)

#clustering
fit <- hclust(as.dist(dist.in.km.matrix), method = "single")
clusters <- cutree(fit,h = threshold.in.km)

plot(lon, lat, col = clusters, pch = 20)
```
This could be a good solution if you don't know the number of clusters (like the k-means option), and is somewhat related to the dbscan option with minPts = 1.

---EDIT---

With the original data:
```
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)

dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM
fit <- hclust(as.dist(dist), method = "single")
clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM
plot(lon, lat, col = clusters, pch = 20)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...