Creating origin-destination matrices with R

后端 未结 2 2063
你的背包
你的背包 2021-01-02 17:35

My data frame consists of individuals and the city they live at a point in time. I would like to generate one origin-destination matrix for each year, which records the numb

相关标签:
2条回答
  • 2021-01-02 18:08

    You could use reshape2's dcast and a loop to do this.

    library(reshape2)
    
    # write function
    write_matrices <- function(year){
      mat <- dcast(subset(df, df$year_move == year), origin ~ destination)
      print(year)  
      print(mat)
    }
    
    # get unique list of years (there was an NA in there, so that's why this is longer than it needs to be
    years <- unique(subset(df, is.na(df$year_move) == FALSE)$year_move)
    
    # loop though and get results
    for (year in years){
      write_matrices(year)
    }
    

    The only thing this doesn't address is the requirement for each matrix to have 5*5, because if some years do not have all the 5 cities only cities in that year are shown.

    You could fix this by adding a step in that turns your observations into a frequency table first, so they are included but as zeros.

    0 讨论(0)
  • 2021-01-02 18:16

    You can split your data from by id, perform the necessary computations on the id-specific data frame to grab all the moves from that person, and then re-combine:

    spl <- split(df, df$id)
    move.spl <- lapply(spl, function(x) {
      ret <- data.frame(from=head(x$city, -1), to=tail(x$city, -1),
                        year=ceiling((head(x$year, -1)+tail(x$year, -1))/2),
                        stringsAsFactors=FALSE)
      ret[ret$from != ret$to,]
    })
    (moves <- do.call(rbind, move.spl))
    #       from    to year
    # 1.1  City4 City2 2007
    # 1.2  City2 City1 2008
    # 1.3  City1 City5 2009
    # 1.4  City5 City4 2009
    # 1.5  City4 City2 2009
    # ...
    

    Because this code uses vectorized computations for each id, it should be a good deal quicker than looping through each row of your data frame as you did in the provided code.

    Now you could grab the year-specific 5x5 move matrices using split and table:

    moves$from <- factor(moves$from)
    moves$to <- factor(moves$to)
    lapply(split(moves, moves$year), function(x) table(x$from, x$to))
    # $`2005`
    #        
    #         City1 City2 City3 City4 City5
    #   City1     0     0     0     0     1
    #   City2     0     0     0     0     0
    #   City3     0     0     0     0     0
    #   City4     0     0     0     0     0
    #   City5     0     0     1     0     0
    # 
    # $`2006`
    #        
    #         City1 City2 City3 City4 City5
    #   City1     0     0     0     1     0
    #   City2     0     0     0     0     0
    #   City3     1     0     0     1     0
    #   City4     0     0     0     0     0
    #   City5     2     0     0     0     0
    # ...
    
    0 讨论(0)
提交回复
热议问题