What is the most efficient way to make a matrix of lagged variables in R for an arbitrary variable (i.e. not a regular time series)
For example:
The method that works best for me
is to use the lag
function from the dplyr
package.
Example:
> require(dplyr)
> lag(1:10, 1)
[1] NA 1 2 3 4 5 6 7 8 9
> lag(1:10, 2)
[1] NA NA 1 2 3 4 5 6 7 8
The running
function in the gtools
package does more or less what you want:
> require("gtools")
> running(1:4, fun=I, width=3, allow.fewer=TRUE)
$`1:1`
[1] 1
$`1:2`
[1] 1 2
$`1:3`
[1] 1 2 3
$`2:4`
[1] 2 3 4
Use a proper class
for your objects; base R has ts
which has a lag()
function to operate on. Note that these ts
objects came from a time when 'delta' or 'frequency' where constant: monthly or quarterly data as in macroeconomic series.
For irregular data such as (business-)daily, use the zoo or xts packages which can also deal (very well!) with lags. To go further from there, you can use packages like dynlm or dlm allow for dynamic regression models with lags.
The Task Views on Time Series, Econometrics, Finance all have further pointers.
You can achieve this using the built-in embed()
function, where its second 'dimension' argument is equivalent to what you've called 'lag':
x <- c(NA,NA,1,2,3,4)
embed(x,3)
## returns
[,1] [,2] [,3]
[1,] 1 NA NA
[2,] 2 1 NA
[3,] 3 2 1
[4,] 4 3 2
embed()
was discussed in a previous answer by Joshua Reich. (Note that I prepended x with NAs to replicate your desired output).
It's not particularly well-named but it is quite useful and powerful for operations involving sliding windows, such as rolling sums and moving averages.