Imputing missing values linearly in R

不想你离开。 提交于 2019-12-01 01:58:55

问题


I have a data frame with missing values:

X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62

I want to impute the NA values linearly from the known values so that the dataframe looks:

X   Y    Z
54  57  57
100 58  58
90  59  57.5
80  60  57
70  61  56.5
60  62  56
63  62  58
66  62  60
69  60  62

thanks


回答1:


Base R's approxfun() returns a function that will linearly interpolate the data it is handed.

## Make easily reproducible data
df <- read.table(text="X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62", header=T)

## See how this works on a single vector
approxfun(1:9, df$X)(1:9)
# [1]  54 100  90  80  70  60  63  66  69

## Apply interpolation to each of the data.frame's columns
data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
#     X  Y    Z
# 1  54 57 57.0
# 2 100 58 58.0
# 3  90 59 57.5
# 4  80 60 57.0
# 5  70 61 56.5
# 6  60 62 56.0
# 7  63 62 58.0
# 8  66 62 60.0
# 9  69 62 62.0



回答2:


I can recommend the imputeTS package, which I am maintaining (even if it's for time series imputation)

For this case it would work like this:

library(imputeTS)
df$X <- na.interpolation(df$X, option ="linear")
df$Y <- na.interpolation(df$Y, option ="linear")
df$Z <- na.interpolation(df$Z, option ="linear")

As mentioned the package requires time series / vector input. (that's why each column has to be called seperatly)

The package offers also a lot of other imputation functions like e.g. spline interpolation.



来源:https://stackoverflow.com/questions/22693173/imputing-missing-values-linearly-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!