MICE does not impute certain columns, but also does not give an error

*爱你&永不变心* 提交于 2019-12-01 18:47:24

Ok, so here's the deal... mice relies on its PredictionMatrix. This is a matrix that is used to determine from which columns the missing values of each variable are predicted. If a column is empty, then that variable will not be predicted, regardless of what method you specify.

You can check this matrix by running mice and then typing res$pred. As you can see, the columns for k11 and k15 are empty and therefore they aren't imputed. Purely as an example (NOT A SOLUTION), try specifying mice(pred = diag(ncol(Sparse_Data)), ...). You'll see that now it works. [Edit: For future readers: this is not a way to SOLVE the problem, just to show where the problem is.]

So why does mice make those two columns empty? Well, I tried looking into the source code of mice... Within it, there is a function called check.data. Within that, there is a call to find.collinear, which in turn will specify which variables are collinear, which will then be removed in subsequent steps.

Are any of your columns collinear? Well, yes:

cor(Sparse_Data, use = "pairwise.complete.obs")
            k1            k3          k5            k6          k7           k8        k11        k12          k13         k14         k15
k1   1.0000000  1.740412e-01  0.24932705            NA  0.17164319  0.640984131  0.3053596  0.4225772 -0.536055739 -0.50460872  0.97321365
k3   0.1740412  1.000000e+00 -0.42409199 -9.370804e-05 -0.38583663  0.361416106  0.5515156  0.6567106  0.634250161 -0.70631658  0.74001342
k5   0.2493271 -4.240920e-01  1.00000000  4.471829e-01  0.02679894  0.234850334 -0.6624768  0.4201946 -0.924517670 -0.45408744 -0.78628746
k6          NA -9.370804e-05  0.44718290  1.000000e+00 -0.35377747  0.818644775  0.6824749  0.8899878  0.147657537  0.27030472  0.49159991
k7   0.1716432 -3.858366e-01  0.02679894 -3.537775e-01  1.00000000  0.207791538 -0.6406942 -0.2863018  0.898687181  0.14987951 -0.70210859
k8   0.6409841  3.614161e-01  0.23485033  8.186448e-01  0.20779154  1.000000000  0.7491736  0.5219197  0.002468839 -0.13067177  1.00000000
k11  0.3053596  5.515156e-01 -0.66247684  6.824749e-01 -0.64069422  0.749173578  1.0000000  0.5925582  0.830372468 -1.00000000  0.83452358
k12  0.4225772  6.567106e-01  0.42019459  8.899878e-01 -0.28630180  0.521919747  0.5925582  1.0000000 -0.134937885 -0.49251775  0.92582043
k13 -0.5360557  6.342502e-01 -0.92451767  1.476575e-01  0.89868718  0.002468839  0.8303725 -0.1349379  1.000000000  0.29508347  0.13853862
k14 -0.5046087 -7.063166e-01 -0.45408744  2.703047e-01  0.14987951 -0.130671767 -1.0000000 -0.4925177  0.295083470  1.00000000  0.02558161
k15  0.9732137  7.400134e-01 -0.78628746  4.915999e-01 -0.70210859  1.000000000  0.8345236  0.9258204  0.138538625  0.02558161  1.00000000

As you can see, k11 is perfectly correlated with k14, and k15 with k8. This is why they get kicked out.

So, there are two solutions... either make sure that there are no perfectly correlated pairs in your matrix, or in this case just provide PredictionMatrix yourself.

Edit: To further prove my point.. Try running this code before your code and you'll see that it indeed works:

Sparse_Data$k11[1] <- 2
Sparse_Data$k15[1] <- 2
Sparse_Data$k8[1] <- 0.5
Sparse_Data$k14[1] <- 0.5