问题
I have a panel data set of country years. I would like to calculate time since event, as well as get a running total of events per country which I can decay over time. I am using the timeSinceEvent
function in the doBy
package, which returns a data frame which has the values that I want, but I am having trouble applying this to my main df.
structure(list(ccode.a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L,
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L,
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 40L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L,
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L,
42L, 42L, 42L, 42L, 42L), year = c(1975, 1976, 1977, 1978, 1979,
1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,
1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 1977, 1978,
1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989,
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 1977,
1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988,
1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976,
1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987,
1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975,
1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,
1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985,
1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996,
1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004), onset.a = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("ccode.a", "year",
"onset.a"), row.names = c(NA, 200L), class = "data.frame")
I have tried using this:
last.step <- function(x) {
temp <- timeSinceEvent(x$onset.a, x$year)
cbind(x[,1],temp) #timeSinceEvent cuts off the country ID
}
result <- do.call("rbind", by(data, data$ccode.a, last.step))
As well as
test <- by(data, data$ccode.a, function(x) timeSinceEvent(data$onset.a, data$year))
To little avail. I stepped through the function, and it seems to be doing what I want, but I guess there is a problem in the way that I am calling it?
回答1:
It seems to me the problem is simply that there are no events for ccode.a==20
and so timeSinceEvent
returns NULL
when applied to that subset. This means that last.step
returns data frames of different dimension for the two ccode.a
s and thus the rbind
fails.
Not exactly a solution, but perhaps better understanding where the problem lies already helps.
回答2:
Since there are empty columns you should use rbind.fill()
in plyr
. It will fill with na the columns that are empty
last.step <- function(x) {
temp <- timeSinceEvent(x$onset.a, x$year)
cbind(x[,1],temp) #timeSinceEvent cuts off the country ID
}
result <- do.call(rbind.fill, by(data, data$ccode.a, last.step))
However this won't return the "empty" lists
i.e. the one with only the x[,1]. It will only rbind
those lists
that have data.frame
inside. I don't know if this is the expected behaviour and/or is what you want.
回答3:
Ended up having to modify timeSinceEvent
in the doBy
package a bit. Here is the final code that worked. Kudos to lselzer for pointing out rbind.fill
in plyr
and RoyalTS for pointing out that timeSinceEvent
returns null
when the yvar
argument is all zeros.
panel.tse <- function(yvar, tvar = seq_along(yvar)){
if (!(is.numeric(yvar) | is.logical(yvar))){
stop("yvar must be either numeric or logical")
}
yvar[is.na(yvar)] <- 0
event.idx <- which(yvar == 1)
run <- cumsum(yvar)
un <- unique(run)
tlist <- list()
for (i in 1:length(un)){
v <- un[[i]]
y <- yvar[run == v]
t <- tvar[run == v]
t <- t - t[1]
tlist[[i]] <- t
}
timeAfterEvent <- unlist(tlist)
timeAfterEvent[run == 0] <- NA
run[run == 0] <- NA
ans <- cbind(data.frame(yvar = yvar, tvar = tvar), run, tae = timeAfterEvent)
return(ans)
}
last.step <- function(x) {
temp <- panel.tse(x$onset.a, x$year)
cbind(x[,1],temp)
}
result <- do.call(rbind.fill, by(data, data$ccode.a, last.step))
来源:https://stackoverflow.com/questions/11090404/time-to-event-for-panel-data