I have a dataframe with a column t. I want to create n lagged columns that has names like t-1,t-2 etc..
year t t-1 t-2
19620101 1 NA NA
1963010
If you are looking for efficiency, try data.table
s new shift
function
library(data.table) # V >= 1.9.5
n <- 2
setDT(df)[, paste("t", 1:n) := shift(t, 1:n)][]
# t t 1 t 2
# 1: 1 NA NA
# 2: 2 1 NA
# 3: 3 2 1
# 4: 4 3 2
# 5: 5 4 3
# 6: 6 5 4
Here you can set any name for your new columns (within paste
) and you also don't need to bind this back to the original as this updates your data set by reference using the :=
operator.
I might build something around base R's embed()
x <- c(rep(NA,2),1:6)
embed(x,3)
# [,1] [,2] [,3]
# [1,] 1 NA NA
# [2,] 2 1 NA
# [3,] 3 2 1
# [4,] 4 3 2
# [5,] 5 4 3
# [6,] 6 5 4
Perhaps something like this:
f <- function(x, dimension, pad) {
if(!missing(pad)) {
x <- c(rep(pad, dimension-1), x)
}
embed(x, dimension)
}
f(1:6, dimension=3, pad=NA)
# [,1] [,2] [,3]
# [1,] 1 NA NA
# [2,] 2 1 NA
# [3,] 3 2 1
# [4,] 4 3 2
# [5,] 5 4 3
# [6,] 6 5 4
1) lag.zoo The lag.zoo
function in the zoo package can accept a vector of lags. Here we want the 0th lag, the -1 lag and the -2 lag:
library(zoo)
cbind(DF[-2], coredata(lag(zoo(DF$t), 0:-2)))
giving:
year lag0 lag-1 lag-2
1 19620101 1 NA NA
2 19630102 2 1 NA
3 19640103 3 2 1
4 19650104 4 3 2
5 19650104 5 4 3
6 19650104 6 5 4
which is as you have in the question but are you sure that that is what you want? The last three rows all have the same date so the 4th row, for example, is being lagged to the same date.
2) head Defining a simple Lag function we can do this using only the base of R:
Lag <- function(x, n = 1) c(rep(NA, n), head(x, -n)) # n > 0
data.frame(DF, `t-1` = Lag(DF$t), `t-2` = Lag(DF$t, 2), check.names = FALSE)
giving:
year t t-1 t-2
1 19620101 1 NA NA
2 19630102 2 1 NA
3 19640103 3 2 1
4 19650104 4 3 2
5 19650104 5 4 3
6 19650104 6 5 4
Note: We used this as our data frame:
DF <- data.frame(year = c(19620101, 19630102, 19640103, 19650104, 19650104,
19650104), t = 1:6)