问题
I'm trying to impute some missing values in R
using library(imputation)
and kNNImpute()
. The input data frame is 44 rows of 13 variables. There are 30 complete observations and 14 observations with missing values in 2 columns.
The code is saying it's imputing all the missing values; however, it's imputing the last 4 values as 0
. From my reading of the code, this appears to be a flaw based on using 0
as a default for errors. My code:
# impute data
library(imputation)
knn_data <- kNNImpute(x, k= 5)
# examine kNNImpute code
kNNImpute
kNNImpute
's code: See lines 4, 8 the function starting on line 24 and the 2nd line from the bottom (line 48):
[4] prelim = impute.prelim(x)
[8] x.missing = prelim$x.missing
[24] x.missing.imputed = t(apply(x.missing, 1, function(i) {...}
[48] x[missing.matrix2] = 0
??impute.prelim
returns no results (the help page is missing). So, I can't examine this code.
However, the program flow for kNNImpute
appears to be
[4] # run a (seemingly undefined) screening function
[8] # pull in the missing rows for later imputation
[24] # run imputation function
[48] # based on line [4] output, impute all "error rows" == 0
Can anyone explain why this is happening and/or how to solve this problem?
FYI- I have emailed the package author a link to this page.
回答1:
Solution: I used code identical to the kNNImpute()
function to impute the 4 improperly imputed values.
impute.fn <- function(scores, distances, raw_dist) {
knn.values <- scores[c(as.integer(names(distances)))]
knn.weights <- 1 - (distances / max(raw_dist))
weighted.mean(knn.values, knn.weights)
}
# impute errors - rows 41-44 are improperly imputed
# rows 1-30 have non missing avlues
#---------------------------------------------------------
x.dist <- as.matrix(dist(x))
dist_41 <- x.dist[41, c(1:30)][order(x.dist[41, c(1:30)])]
...
# fix impute - column 1
x$ABC[41] <- impute.fn(x$ABC, dist_41[1:5], dist_41)
...
An appropriate answer from the package author (or other) would still be appreciated.
Note: I have re-written the imputation
package for wKNN. Improved package can be found here: imputaton
来源:https://stackoverflow.com/questions/20294603/r-knn-imputation-function-returning-erroneous-results-missing-help-page