how to insert missing observations on a data frame

柔情痞子 提交于 2019-11-28 01:42:51

This largely depends on how general you wish your solution to be. But, if you want a non-general solution you can do #1 pretty simply. Here, I assume that you're using T as your time variable.

insert_miss <- function(df, time_val= "T", by= 1) {
  val <- get(time_val, envir= as.environment(df))
  val_range <- range(val)
  comp <- seq(val_range[1], val_range[2], by=by)
  which_miss <- comp[!comp %in% val]
  # generating a sample row depends a lot on your particular problem
  # also, specifically how to impute the missing values depends on your 
  # specific problem / domain
  ## here's one simple solution which is not generic
  row_samp <- df[1,]
  df2 <- do.call("rbind", replicate(length(which_miss), row_samp, simplify= FALSE))
  df2[[time_val]] <- which_miss
  others <- which(names(df2) != time_val)
  df2[, others] <- NA
  return(df2)
}

run

insert_miss(<your_df>)
R> A cond   T Vlog
1 NA   NA 421   NA
2 NA   NA 422   NA

Your example data doesn't match the chart image you posted, but here's a answer with random data

# random x-y series
set.seed(123)
dat <- data.frame(x=1:200,
                  y=cumsum(rnorm(200)))

# punch some holes
dat <- dat[-c(20:40, 90:120), ]

# for each point, find gap to next point
diff2next <- with(dat, x[-1] - x[-nrow(dat)])

# now find position of non consecutive points (i.e. where gap > 1)
holes_start <- which(diff2next > 1)
holes_end <- holes_start + 1 #(by definition the gap ends with the next point)

# that's it. here's a plot of the line and the identified holes
ggplot() + 
  geom_line(data=dat, aes(x, y)) + # the line
  geom_point(data=dat[c(holes_start, holes_end), ], 
             aes(x, y), color='red') # the hole start/ends

Mirosław Zalewski

Assuming that your data frame is called ts.df and T variable is sequential (as in it increases by one and only by one on each and every data point), you can generate data.frame with all T values in range and OUTER JOIN it into your existing data.frame to get NAs filled in automatically:

ids <- data.frame(T=seq(from=min(ts.df$T), to=max(ts.df$T)), A=0, cond="Si")
ts.df <- merge(ts.df, ids, all.y=TRUE)
ggplot(ts.df, aes(T, Vlog)) + geom_line() + geom_point()

This will assign Si value for cond variable for all rows and 0 value for A variable. The first one seems about right, the second one is irrelevant for your chart anyway.

You might need to split entire data.frame by condition, run above code to modify subset for one condition and join data.frames again to get it working on your current ggplot() call, but since you haven't posted reproducible example of your problem, there is only so much I can do.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!