Assign rows to a group based on spatial neighborhood and temporal criteria in R

瘦欲@ 提交于 2019-12-06 12:38:34

I think this task requires something along the lines of hierarchical clustering.

Note, however, that there will be necessarily some degree of arbitrariness in the ids. This is because it is entirely possible that the cluster of fires itself is longer than 4 days yet every fire is less than 4 days away from some other fire in that cluster (and thus should have the same id).

library(dplyr)

# Create the distances
fire_dist <- fire_df %>%
  # Normalize dates
  mutate( norm_dates = as.numeric(dates)/4) %>% 
  # Only keep the three variables of interest
  select( rows, cols, norm_dates ) %>%
  # Compute distance using L-infinite-norm (maximum)
  dist( method="maximum" )

# Do hierarchical clustering with "single" aggl method
fire_clust <- hclust(fire_dist, method="single")

# Cut the tree at height 1 and obtain groups
group_id <- cutree(fire_clust, h=1)

# First attach the group ids back to the data frame
fire_df2 <- cbind( fire_df, group_id ) %>%
  # Then sort the data
  arrange( group_id, dates, rows, cols ) 

# Print the first 20 records
fire_df2[1:10,]

(Make sure you have dplyr library installed. You can run install.packages("dplyr",dep=TRUE) if not installed. It is a really good and very popular library for data manipulations)

A couple of simple tests:

Test #1. The same forest fire moving.

rows<-1:6
cols<-1:6
dates<-seq(from=as.Date("2000/01/01"), to=as.Date("2000/01/06"), by="day")
fire_df<-data.frame(rows, cols, dates)

gives me this:

  rows cols      dates group_id
1    1    1 2000-01-01        1
2    2    2 2000-01-02        1
3    3    3 2000-01-03        1
4    4    4 2000-01-04        1
5    5    5 2000-01-05        1
6    6    6 2000-01-06        1

Test #2. 6 different random forest fires.

set.seed(1234)

rows<-sample(seq(1,50,1),6, replace=TRUE)
cols<-sample(seq(1,50,1),6, replace=TRUE)
dates<-sample(seq(from=as.Date("2000/01/01"), to=as.Date("2000/02/01"), by="day"),6, replace=TRUE)
fire_df<-data.frame(rows, cols, dates)

output:

rows cols      dates group_id
1    6    1 2000-01-10        1
2   32   12 2000-01-30        2
3   31   34 2000-01-10        3
4   32   26 2000-01-27        4
5   44   35 2000-01-10        5
6   33   28 2000-01-09        6

Test #3: one expanding forest fire

dates <- seq(from=as.Date("2000/01/01"), to=as.Date("2000/01/06"), by="day")
rows_start <- 50
cols_start <- 50

fire_df <- data.frame(dates = dates) %>%
    rowwise() %>%
    do({
      diff = as.numeric(.$dates - as.Date("2000/01/01"))
      expand.grid(rows=seq(rows_start-diff,rows_start+diff), 
                  cols=seq(cols_start-diff,cols_start+diff),
                  dates=.$dates) 
    })

gives me:

  rows cols      dates group_id
1    50   50 2000-01-01        1
2    49   49 2000-01-02        1
3    49   50 2000-01-02        1
4    49   51 2000-01-02        1
5    50   49 2000-01-02        1
6    50   50 2000-01-02        1
7    50   51 2000-01-02        1
8    51   49 2000-01-02        1
9    51   50 2000-01-02        1
10   51   51 2000-01-02        1

and so on. (All records identified correctly to belong to the same forest fire.)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!