My data frame consists of individuals and the city they live at a point in time. I would like to generate one origin-destination matrix for each year, which records the numb
You can split your data from by id, perform the necessary computations on the id-specific data frame to grab all the moves from that person, and then re-combine:
spl <- split(df, df$id)
move.spl <- lapply(spl, function(x) {
ret <- data.frame(from=head(x$city, -1), to=tail(x$city, -1),
year=ceiling((head(x$year, -1)+tail(x$year, -1))/2),
stringsAsFactors=FALSE)
ret[ret$from != ret$to,]
})
(moves <- do.call(rbind, move.spl))
# from to year
# 1.1 City4 City2 2007
# 1.2 City2 City1 2008
# 1.3 City1 City5 2009
# 1.4 City5 City4 2009
# 1.5 City4 City2 2009
# ...
Because this code uses vectorized computations for each id, it should be a good deal quicker than looping through each row of your data frame as you did in the provided code.
Now you could grab the year-specific 5x5 move matrices using split
and table
:
moves$from <- factor(moves$from)
moves$to <- factor(moves$to)
lapply(split(moves, moves$year), function(x) table(x$from, x$to))
# $`2005`
#
# City1 City2 City3 City4 City5
# City1 0 0 0 0 1
# City2 0 0 0 0 0
# City3 0 0 0 0 0
# City4 0 0 0 0 0
# City5 0 0 1 0 0
#
# $`2006`
#
# City1 City2 City3 City4 City5
# City1 0 0 0 1 0
# City2 0 0 0 0 0
# City3 1 0 0 1 0
# City4 0 0 0 0 0
# City5 2 0 0 0 0
# ...