I want to create lagged variable for a variable pm10 and used the following code. However, I could not get what I wanted. How could I create a lag of pm10?
d
I guess a solution for dummies would just be to create a "lagged" version of the vector or column (adding an NA in the first position) and then bind the columns together:
x<-1:10; #Example vector
x_lagged <- c(NA, x[1:(length(x)-1)]);
new_x <- cbind(x,x_lagged);
Another alternative is using the shift
-function from the data.table package:
library(data.table)
setDT(df2)[, c("l1pm10","l1pm102") := .(shift(pm10, 1L, fill = NA, type = "lag"),
shift(pm10, 1L, fill = NA, type = "lead"))]
this gives:
> df2 var1 pm10 l1pm10 l1pm102 1: 1 26.95607 NA NA 2: 2 NA 26.95607 32.83869 3: 3 32.83869 NA 39.95607 4: 4 39.95607 32.83869 NA 5: 5 NA 39.95607 40.95607 6: 6 40.95607 NA 33.95607 7: 7 33.95607 40.95607 28.95607 8: 8 28.95607 33.95607 32.34877 9: 9 32.34877 28.95607 NA 10: 10 NA 32.34877 NA
Used data:
df2 <- structure(list(var1 = 1:10, pm10 = c(26.956073733, NA, 32.838694951,
39.9560737332, NA, 40.9560737332, 33.956073733, 28.956073733,
32.348770798, NA)), .Names = c("var1", "pm10"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
In base R the function lag()
is useful for time series objects. Here you have a dataframe and the situation is somewhat different.
You could try the following, which I admit is not very elegant:
df2$l1pm10 <- sapply(1:nrow(df2), function(x) df2$pm10[x+1])
df2$l1pm102 <- sapply(1:nrow(df2), function(x) df2$pm10[x-1])
#> df2
# var1 pm10 l1pm10 l1pm102
#1 1 26.95607 NA
#2 2 NA 32.83869 26.95607
#3 3 32.83869 39.95607 NA
#4 4 39.95607 NA 32.83869
#5 5 NA 40.95607 39.95607
#6 6 40.95607 33.95607 NA
#7 7 33.95607 28.95607 40.95607
#8 8 28.95607 32.34877 33.95607
#9 9 32.34877 NA 28.95607
#10 10 NA NA 32.34877
An alternative consists in using the Lag()
function (with capital "L") from the Hmisc
package:
library(Hmisc)
df2$l1pm10 <- Lag(df2$pm10, -1)
df2$l1pm102 <- Lag(df2$pm10, +1)
#> df2
# var1 pm10 l1pm10 l1pm102
#1 1 26.95607 NA NA
#2 2 NA 32.83869 26.95607
#3 3 32.83869 39.95607 NA
#4 4 39.95607 NA 32.83869
#5 5 NA 40.95607 39.95607
#6 6 40.95607 33.95607 NA
#7 7 33.95607 28.95607 40.95607
#8 8 28.95607 32.34877 33.95607
#9 9 32.34877 NA 28.95607
#10 10 NA NA 32.34877
I know the question is been accepted but months ago I faced the same problem (in this question) and I wanted to create an homemade lag
function.
Here is the code:
df2$lagpm10 <- c(NA, df2$pm10[seq_along(df2$pm10) -1])
df2
var1 pm10 l1pm10 lagpm10
1 1 26.95607 26.95607 NA
2 2 NA NA 26.95607
3 3 32.83869 32.83869 NA
4 4 39.95607 39.95607 32.83869
5 5 NA NA 39.95607
6 6 40.95607 40.95607 NA
7 7 33.95607 33.95607 40.95607
8 8 28.95607 28.95607 33.95607
9 9 32.34877 32.34877 28.95607
10 10 NA NA 32.34877
where Rhertel1 and Rhertel2 are the two lines of code of Rhertel and Sabdem is mine.
Unit: microseconds
expr min lq mean median uq max neval
Rhertel1 250.523 257.740 272.07275 260.3355 264.0945 3540.187 10000
Rhertel2 246.641 253.887 271.77003 256.5380 260.4935 14637.791 10000
Sabdem 57.762 60.521 65.85315 61.3765 62.6050 12275.979 10000