spatial clustering in R (simple example)

前端 未结 3 910
抹茶落季
抹茶落季 2021-02-04 14:49

I have this simple data.frame

 lat<-c(1,2,3,10,11,12,20,21,22,23)
 lon<-c(5,6,7,30,31,32,50,51,52,53)
 data=data.frame(lat,lon)
相关标签:
3条回答
  • 2021-02-04 15:20

    As you have a spatial data to cluster, so DBSCAN is best suited for you data. You can do this clustering using dbscan() function provided by fpc, a R package.

    library(fpc)
    
    lat<-c(1,2,3,10,11,12,20,21,22,23)
    lon<-c(5,6,7,30,31,32,50,51,52,53)
    
    DBSCAN <- dbscan(cbind(lat, lon), eps = 1.5, MinPts = 3)
    plot(lon, lat, col = DBSCAN$cluster, pch = 20)
    

    Plot of DBSCAN Clustering

    0 讨论(0)
  • 2021-02-04 15:25

    What about something like this:

    lat<-c(1,2,3,10,11,12,20,21,22,23)
    lon<-c(5,6,7,30,31,32,50,51,52,53)
    
    km <- kmeans(cbind(lat, lon), centers = 3)
    plot(lon, lat, col = km$cluster, pch = 20)
    

    enter image description here

    0 讨论(0)
  • 2021-02-04 15:31

    Here's a different approach. First it assumes that the coordinates are WGS-84 and not UTM (flat). Then it clusters all neighbors within a given radius to the same cluster using hierarchical clustering (with method = single, which adopts a 'friends of friends' clustering strategy).

    In order to compute the distance matrix, I'm using the rdist.earth method from the package fields. The default earth radius for this package is 6378.388 (the equatorial radius) which might not be what one is looking for, so I've changed it to 6371. See this article for more info.

    library(fields)
    lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284)
    lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378)
    threshold.in.km <- 40
    coors <- data.frame(lon,lat)
    
    #distance matrix
    dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371)
    
    #clustering
    fit <- hclust(as.dist(dist.in.km.matrix), method = "single")
    clusters <- cutree(fit,h = threshold.in.km)
    
    plot(lon, lat, col = clusters, pch = 20)
    

    This could be a good solution if you don't know the number of clusters (like the k-means option), and is somewhat related to the dbscan option with minPts = 1.

    ---EDIT---

    With the original data:

    lat<-c(1,2,3,10,11,12,20,21,22,23)
    lon<-c(5,6,7,30,31,32,50,51,52,53)
    data=data.frame(lat,lon)
    
    dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM
    fit <- hclust(as.dist(dist), method = "single")
    clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM
    plot(lon, lat, col = clusters, pch = 20)
    
    0 讨论(0)
提交回复
热议问题