I have a well balanced panel data set which contains NA observations. I will be using LOCF, and would like to know how many consecutive NA\'s are in each panel, before carry
This will do it:
data[, max(with(rle(is.na(x)), lengths[values])), by = id]
I just ran rle
to find all consecutive NA
's and picked the max length.
Here's a rather convoluted answer to the comment question of recovering the date ranges for the above max
:
data[, {
tmp = rle(is.na(x));
tmp$lengths[!tmp$values] = 0; # modify rle result to ignore non-NA's
n = which.max(tmp$lengths); # find the index in rle of longest NA sequence
tmp = rle(is.na(x)); # let's get back to the unmodified rle
start = sum(tmp$lengths[0:(n-1)]) + 1; # and find the start and end indices
end = sum(tmp$lengths[1:n]);
list(date[start], date[end], max(tmp$lengths[tmp$values]))
}, by = id]
You can use rle
with the modification suggested here (and pasted below) to count NA
values.
foo <- data[, rle(x), by=id]
foo[is.na(values), max(lengths), by=id]
# id V1
# 1: 1 1
# 2: 2 3
# 3: 3 3
# 4: 4 3
# 5: 5 4
# 6: 6 5
# 7: 7 3
# 8: 8 5
# 9: 9 2
# 10: 10 2
Amended rle
function:
rle<-function (x)
{
if (!is.vector(x)&& !is.list(x))
stop("'x' must be an atomic vector")
n<- length(x)
if (n == 0L)
return(structure(list(lengths = integer(), values = x),
class = "rle"))
#### BEGIN NEW SECTION PART 1 ####
naRepFlag<-F
if(any(is.na(x))){
naRepFlag<-T
IS_LOGIC<-ifelse(typeof(x)=="logical",T,F)
if(typeof(x)=="logical"){
x<-as.integer(x)
naMaskVal<-2
}else if(typeof(x)=="character"){
naMaskVal<-paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse="")
}else{
naMaskVal<-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1
}
x[which(is.na(x))]<-naMaskVal
}
#### END NEW SECTION PART 1 ####
y<- x[-1L] != x[-n]
i<- c(which(y), n)
#### BEGIN NEW SECTION PART 2 ####
if(naRepFlag)
x[which(x==naMaskVal)]<-NA
if(IS_LOGIC)
x<-as.logical(x)
#### END NEW SECTION PART 2 ####
structure(list(lengths = diff(c(0L, i)), values = x[i]),
class = "rle")
}