问题
> tempDT <- data.table(colA = c("E","E","A","A","E","A","E")
+ , lags = c(NA,1,1,2,3,1,2))
> tempDT
colA lags
1: E NA
2: E 1
3: A 1
4: A 2
5: E 3
6: A 1
7: E 2
I have column colA
, and need to find lags between current row and the previous row whose colA == "E"
.
Note: if we could find the row reference for the previous row whose colA == "E"
, then we could calculate the lags. However, I don't know how to achieve it.
回答1:
1) Define lastEpos
which given i
returns the position of the last E
among the first i
rows and apply that to each row number:
lastEpos <- function(i) tail(which(tempDT$colA[1:i] == "E"), 1)
tempDT[, lags := .I - shift(sapply(.I, lastEpos))]
Here are a few variations:
2) i-1 In this variation lastEpos
returns the positions of the last E
among the first i-1
rows rather than i
:
lastEpos <- function(i) tail(c(NA, which(tempDT$colA[seq_len(i-1)] == "E")), 1)
tempDT[, lags := .I - sapply(.I, lastEpos)]
3) Position Similar to (2) but uses Position
:
lastEpos <- function(i) Position(c, tempDT$colA[seq_len(i-1)] == "E", right = TRUE)
tempDT[, lags := .I - sapply(.I, lastEpos)]
4) rollapply
library(zoo)
w <- lapply(1:nrow(tempDT), function(i) -rev(seq_len(i-1)))
tempDT[, lags := .I - rollapply(colA == "E", w, Position, f = c, right = TRUE)]
5) sqldf
library(sqldf)
sqldf("select a.colA, a.rowid - b.rowid lags
from tempDT a left join tempDT b
on b.rowid < a.rowid and b.colA = 'E'
group by a.rowid")
来源:https://stackoverflow.com/questions/49142288/r-data-table-find-lags-between-current-row-to-previous-row