LOCF and NOCF methods for missing data: how to plot data?

戏子无情 提交于 2021-02-20 03:49:20

问题


I'm working on the following dataset and its missing data:

# A tibble: 27 x 6
      id sex      d8   d10   d12   d14
   <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
 1     1 F      21    20    21.5  23  
 2     2 F      21    21.5  24    25.5
 3     3 NA     NA    24    NA    26  
 4     4 F      23.5  24.5  25    26.5
 5     5 F      21.5  23    22.5  23.5
 6     6 F      20    21    21    22.5
 7     7 F      21.5  22.5  23    25  
 8     8 F      23    23    23.5  24  
 9     9 F      NA    21    NA    21.5
10    10 F      16.5  19    19    19.5
# ... with 17 more rows

I would like to fill the missiningness data via the Last Observation Carried Forward method (LOCF) and the Next Observation Carried Backward one (NOCB) and report also a graphic representation, plotting the individual profiles during age by sex, highlighting the imputed values, and compute the means and the standard errors at each age by sex. May you suggest a way to set properly the argument in plot() function?

Someone may have any clue about this?

I let you below some code, just in case they could turn out as useful, drawn from other dataset as example.

par(mfrow=c(1,1))
Oz <- airquality$Ozone
locf <- function(x) {
  a <- x[1]
  for (i in 2:length(x)) {
    if (is.na(x[i])) x[i] <- a
    else a <- x[i]
  }
  return(x)
}
Ozi <- locf(Oz)
colvec <- ifelse(is.na(Oz),mdc(2),mdc(1))

### Figure

plot(Ozi[1:80],col=colvec,type="l",xlab="Day number",ylab="Ozone (ppb)")
points(Ozi[1:80],col=colvec,pch=20,cex=1)

回答1:


Next Observation Carried Backward / Last Observation Carried Forward is probably a very bad choice for your data.

These algorithms are usually used for time series data. Where carrying the last observation forward might be a good idea. E.g. if you think of 10 minute temperature measurements, the actual outdoor temperature will be quite likely quite similar to the temperature 10 minutes ago.

For cross sectional data (it seems you are looking at persons) the previous person is usually no more similar to actual person than any other random person.

Take a look at the mice R package for your cross-sectional dataset. It offers way better algorithms for your case than locf/nocb. Here is a overview about the function it offers: https://amices.org/mice/reference/index.html

It also includes different plots to assess the imputations e.g.:

Usually when using mice you create multiple possible imputations ( is worth reading about the technique of multiple imputation ). But you can also just produce one imputed dataset with the package.

There are the following functions for visualization of your imputations:

  • bwplot() (Box-and-whisker plot of observed and imputed data)
  • densityplot() (Density plot of observed and imputed data)
  • stripplot() (Stripplot of observed and imputed data)
  • xyplot()(Scatterplot of observed and imputed data)

Hope this helps a little bit. So my advice would be to take a look at this package and then start a new approach with your new knowledge.



来源:https://stackoverflow.com/questions/66197061/locf-and-nocf-methods-for-missing-data-how-to-plot-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!