Imputing missing values linearly in R

依然范特西╮ 提交于 2019-12-01 05:24:39

Base R's approxfun() returns a function that will linearly interpolate the data it is handed.

## Make easily reproducible data
df <- read.table(text="X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62", header=T)

## See how this works on a single vector
approxfun(1:9, df$X)(1:9)
# [1]  54 100  90  80  70  60  63  66  69

## Apply interpolation to each of the data.frame's columns
data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
#     X  Y    Z
# 1  54 57 57.0
# 2 100 58 58.0
# 3  90 59 57.5
# 4  80 60 57.0
# 5  70 61 56.5
# 6  60 62 56.0
# 7  63 62 58.0
# 8  66 62 60.0
# 9  69 62 62.0

I can recommend the imputeTS package, which I am maintaining (even if it's for time series imputation)

For this case it would work like this:

library(imputeTS)
df$X <- na.interpolation(df$X, option ="linear")
df$Y <- na.interpolation(df$Y, option ="linear")
df$Z <- na.interpolation(df$Z, option ="linear")

As mentioned the package requires time series / vector input. (that's why each column has to be called seperatly)

The package offers also a lot of other imputation functions like e.g. spline interpolation.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!