time to event for panel data

半世苍凉 提交于 2019-12-11 14:04:48

问题


I have a panel data set of country years. I would like to calculate time since event, as well as get a running total of events per country which I can decay over time. I am using the timeSinceEvent function in the doBy package, which returns a data frame which has the values that I want, but I am having trouble applying this to my main df.

structure(list(ccode.a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 40L, 40L, 
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 
40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 
41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L), year = c(1975, 1976, 1977, 1978, 1979, 
1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 
1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 
2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 1977, 1978, 
1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 1977, 
1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 
1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 
2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 
1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 
1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 
1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 
1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 
1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004), onset.a = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("ccode.a", "year", 
"onset.a"), row.names = c(NA, 200L), class = "data.frame")

I have tried using this:

last.step <- function(x) {
  temp <- timeSinceEvent(x$onset.a, x$year)
  cbind(x[,1],temp) #timeSinceEvent cuts off the country ID
}
result <- do.call("rbind", by(data, data$ccode.a, last.step))

As well as

test <- by(data, data$ccode.a, function(x) timeSinceEvent(data$onset.a, data$year))

To little avail. I stepped through the function, and it seems to be doing what I want, but I guess there is a problem in the way that I am calling it?


回答1:


It seems to me the problem is simply that there are no events for ccode.a==20 and so timeSinceEvent returns NULL when applied to that subset. This means that last.step returns data frames of different dimension for the two ccode.as and thus the rbind fails.

Not exactly a solution, but perhaps better understanding where the problem lies already helps.




回答2:


Since there are empty columns you should use rbind.fill() in plyr. It will fill with na the columns that are empty

last.step <- function(x) {
  temp <- timeSinceEvent(x$onset.a, x$year)
  cbind(x[,1],temp) #timeSinceEvent cuts off the country ID
}
result <- do.call(rbind.fill, by(data, data$ccode.a, last.step))

However this won't return the "empty" lists i.e. the one with only the x[,1]. It will only rbind those lists that have data.frame inside. I don't know if this is the expected behaviour and/or is what you want.




回答3:


Ended up having to modify timeSinceEvent in the doBy package a bit. Here is the final code that worked. Kudos to lselzer for pointing out rbind.fill in plyr and RoyalTS for pointing out that timeSinceEvent returns null when the yvar argument is all zeros.

panel.tse <- function(yvar, tvar = seq_along(yvar)){
   if (!(is.numeric(yvar) | is.logical(yvar))){
        stop("yvar must be either numeric or logical")
    }
   yvar[is.na(yvar)] <- 0
   event.idx <- which(yvar == 1)
   run <- cumsum(yvar)
   un <- unique(run)
   tlist <- list()
   for (i in 1:length(un)){
     v <- un[[i]]
     y <- yvar[run == v]
     t <- tvar[run == v]
     t <- t - t[1]
     tlist[[i]] <- t
   }
   timeAfterEvent <- unlist(tlist)
   timeAfterEvent[run == 0] <- NA
   run[run == 0] <- NA
   ans <- cbind(data.frame(yvar = yvar, tvar = tvar), run, tae = timeAfterEvent)
   return(ans)
 }

last.step <- function(x) {
  temp <- panel.tse(x$onset.a, x$year)
  cbind(x[,1],temp) 
}

result <- do.call(rbind.fill, by(data, data$ccode.a, last.step))


来源:https://stackoverflow.com/questions/11090404/time-to-event-for-panel-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!