问题
I know that similar questions have been asked before (e.g., 1, 2, 3), but I still can not understand the reason why MICE is failing to predict missing values even when I try unconditioned mean like in the example 1.
The sparse matrix I have is :
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
[1,] NA NA NA NA NA NA NA NA NA NA 0.066667
[2,] 0.909091 NA NA NA NA 0.944723 NA NA 0.545455 NA NA
[3,] 0.545455 NA NA NA NA NA NA NA 0.818182 0.800000 0.466667
[4,] 0.545455 NA 0.642857 NA NA 0.260954 NA NA NA NA NA
[5,] NA 0.750 0.500000 NA 0.869845 NA 0.595013 NA NA NA NA
[6,] 0.727273 0.625 NA 0.583333 NA NA NA 0.500000 0.545455 NA NA
[7,] NA NA 0.571429 NA NA NA NA NA NA NA 0.866667
[8,] 0.545455 NA NA NA NA 0.905593 0.677757 NA NA NA NA
[9,] NA 0.999 0.714286 0.750000 NA NA 0.881032 NA NA 0.933333 0.733333
[10,] NA 0.750 NA NA NA NA NA NA 0.545455 NA NA
[11,] NA NA NA NA NA NA NA NA 0.818182 NA NA
[12,] NA 0.999 NA 0.583333 NA NA 0.986145 0.666667 0.909091 NA NA
[13,] 0.818182 NA 0.857143 0.583333 0.001000 NA NA NA NA 0.133333 NA
[14,] NA 0.999 0.357143 NA 0.635087 NA NA NA NA NA NA
[15,] NA 0.750 0.857143 0.250000 0.742082 0.001000 0.001000 NA 0.636364 NA 0.533333
[16,] NA 0.999 NA 0.250000 NA NA NA NA 0.909091 NA NA
[17,] 0.727273 0.999 0.001000 NA NA NA 0.886366 0.666667 0.909091 0.800000 0.933333
[18,] NA NA 0.571429 NA NA 0.953382 NA 0.833333 0.727273 NA NA
[19,] NA NA NA NA 0.661476 NA NA 0.500000 NA 0.933333 0.600000
[20,] NA NA 0.857143 NA 0.661661 0.459014 0.283793 NA NA NA NA
[21,] NA NA NA NA NA NA NA NA NA NA 0.800000
[22,] 0.454545 NA NA NA NA NA NA 0.333333 0.727273 NA 0.533333
[23,] NA NA NA 0.333333 0.790737 NA NA NA 0.727273 0.433333 NA
[24,] NA 0.875 NA NA NA NA NA NA NA 0.999000 NA
[25,] NA NA 0.571429 0.583333 NA NA 0.196147 0.500000 NA NA NA
[26,] NA 0.999 0.642857 0.250000 NA NA NA NA 0.636364 0.700000 NA
[27,] NA NA 0.714286 NA NA NA NA NA NA NA NA
[28,] NA 0.875 NA 0.500000 NA NA NA NA NA NA 0.666667
[29,] 0.636364 0.750 NA NA NA 0.999000 0.999000 NA NA NA NA
[30,] 0.727273 NA NA NA 0.916098 0.734748 NA NA NA 0.833333 NA
[31,] NA NA NA NA NA NA NA NA NA NA 0.733333
[32,] NA 0.875 NA 0.500000 NA NA NA NA 0.818182 NA NA
[33,] 0.636364 NA NA NA NA NA 0.829819 NA 0.727273 NA 0.733333
[34,] NA NA 0.500000 NA NA NA NA NA NA NA 0.666667
[35,] NA NA 0.214286 NA NA 0.529592 NA 0.001000 0.909091 NA NA
[36,] NA NA NA 0.416667 0.808369 NA NA 0.500000 0.909091 0.633333 0.733333
[37,] NA NA 0.357143 NA NA 0.837555 0.755077 NA 0.818182 NA NA
[38,] NA NA NA 0.166667 0.841643 0.364216 NA NA NA 0.733333 NA
[39,] NA NA 0.500000 0.750000 NA NA NA NA 0.818182 0.999000 0.800000
[40,] NA NA NA NA 0.931836 NA NA NA NA NA 0.133333
[41,] NA NA 0.714286 NA NA 0.848688 NA NA NA NA NA
[42,] NA NA 0.214286 0.333333 0.700812 0.208412 NA 0.333333 NA NA NA
[43,] 0.454545 NA NA NA 0.109326 0.346767 0.877241 0.833333 NA NA NA
[44,] 0.818182 NA 0.857143 NA NA 0.931636 NA NA NA 0.733333 NA
[45,] 0.363636 0.750 NA NA NA NA NA 0.166667 0.818182 NA NA
[46,] NA NA 0.785714 NA 0.738672 NA NA NA NA 0.100000 NA
[47,] 0.181818 NA NA NA NA NA NA NA NA NA 0.001000
[48,] NA NA 0.001000 0.083333 0.308050 0.139592 NA 0.166667 NA NA NA
[49,] NA NA NA NA 0.561841 0.817696 NA 0.666667 NA 0.300000 NA
[50,] NA NA NA 0.416667 NA NA NA NA 0.545455 NA 0.866667
[51,] NA 0.875 NA NA 0.039781 NA NA NA NA 0.933333 NA
[52,] NA NA 0.357143 NA NA NA NA 0.333333 NA NA NA
[53,] NA 0.999 NA NA NA 0.835015 NA NA NA 0.833333 0.666667
[54,] NA 0.750 NA 0.416667 NA NA 0.623528 0.333333 0.818182 NA NA
[55,] NA NA NA 0.666667 NA 0.878312 NA NA NA NA NA
And I apply the following standard mice function
res<-mice(Sparse_Data,maxit = 30,meth='mean',seed = 500,print=FALSE)
t<-complete(res, action="long",TRUE) #all theestimations in 10 itterations
out <- split( t , f = t$.imp )[-1]
a<-Reduce("+", out)/length(out)
data_Pred<-a[,3:ncol(a)]
The predicted matrix I get is:
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
56 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.066667
57 0.9090910 0.8676667 0.5373542 0.4429824 0.6069598 0.9447230 NA 0.4583958 0.5454550 0.6959606 NA
58 0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.8000000 0.466667
59 0.5454550 0.8676667 0.6428570 0.4429824 0.6069598 0.2609540 NA 0.4583958 0.7561986 0.6959606 NA
60 0.6060607 0.7500000 0.5000000 0.4429824 0.8698450 0.6313629 0.595013 0.4583958 0.7561986 0.6959606 NA
61 0.7272730 0.6250000 0.5373542 0.5833330 0.6069598 0.6313629 NA 0.5000000 0.5454550 0.6959606 NA
62 0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.866667
63 0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.9055930 0.677757 0.4583958 0.7561986 0.6959606 NA
64 0.6060607 0.9990000 0.7142860 0.7500000 0.6069598 0.6313629 0.881032 0.4583958 0.7561986 0.9333330 0.733333
65 0.6060607 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.5454550 0.6959606 NA
66 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.6959606 NA
67 0.6060607 0.9990000 0.5373542 0.5833330 0.6069598 0.6313629 0.986145 0.6666670 0.9090910 0.6959606 NA
68 0.8181820 0.8676667 0.8571430 0.5833330 0.0010000 0.6313629 NA 0.4583958 0.7561986 0.1333330 NA
69 0.6060607 0.9990000 0.3571430 0.4429824 0.6350870 0.6313629 NA 0.4583958 0.7561986 0.6959606 NA
70 0.6060607 0.7500000 0.8571430 0.2500000 0.7420820 0.0010000 0.001000 0.4583958 0.6363640 0.6959606 0.533333
71 0.6060607 0.9990000 0.5373542 0.2500000 0.6069598 0.6313629 NA 0.4583958 0.9090910 0.6959606 NA
72 0.7272730 0.9990000 0.0010000 0.4429824 0.6069598 0.6313629 0.886366 0.6666670 0.9090910 0.8000000 0.933333
73 0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.9533820 NA 0.8333330 0.7272730 0.6959606 NA
74 0.6060607 0.8676667 0.5373542 0.4429824 0.6614760 0.6313629 NA 0.5000000 0.7561986 0.9333330 0.600000
75 0.6060607 0.8676667 0.8571430 0.4429824 0.6616610 0.4590140 0.283793 0.4583958 0.7561986 0.6959606 NA
76 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.800000
77 0.4545450 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.3333330 0.7272730 0.6959606 0.533333
78 0.6060607 0.8676667 0.5373542 0.3333330 0.7907370 0.6313629 NA 0.4583958 0.7272730 0.4333330 NA
79 0.6060607 0.8750000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.9990000 NA
80 0.6060607 0.8676667 0.5714290 0.5833330 0.6069598 0.6313629 0.196147 0.5000000 0.7561986 0.6959606 NA
81 0.6060607 0.9990000 0.6428570 0.2500000 0.6069598 0.6313629 NA 0.4583958 0.6363640 0.7000000 NA
82 0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 NA
83 0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.666667
84 0.6363640 0.7500000 0.5373542 0.4429824 0.6069598 0.9990000 0.999000 0.4583958 0.7561986 0.6959606 NA
85 0.7272730 0.8676667 0.5373542 0.4429824 0.9160980 0.7347480 NA 0.4583958 0.7561986 0.8333330 NA
86 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.733333
87 0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.6959606 NA
88 0.6363640 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 0.829819 0.4583958 0.7272730 0.6959606 0.733333
89 0.6060607 0.8676667 0.5000000 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.666667
90 0.6060607 0.8676667 0.2142860 0.4429824 0.6069598 0.5295920 NA 0.0010000 0.9090910 0.6959606 NA
91 0.6060607 0.8676667 0.5373542 0.4166670 0.8083690 0.6313629 NA 0.5000000 0.9090910 0.6333330 0.733333
92 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.8375550 0.755077 0.4583958 0.8181820 0.6959606 NA
93 0.6060607 0.8676667 0.5373542 0.1666670 0.8416430 0.3642160 NA 0.4583958 0.7561986 0.7333330 NA
94 0.6060607 0.8676667 0.5000000 0.7500000 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.9990000 0.800000
95 0.6060607 0.8676667 0.5373542 0.4429824 0.9318360 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.133333
96 0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.8486880 NA 0.4583958 0.7561986 0.6959606 NA
97 0.6060607 0.8676667 0.2142860 0.3333330 0.7008120 0.2084120 NA 0.3333330 0.7561986 0.6959606 NA
98 0.4545450 0.8676667 0.5373542 0.4429824 0.1093260 0.3467670 0.877241 0.8333330 0.7561986 0.6959606 NA
99 0.8181820 0.8676667 0.8571430 0.4429824 0.6069598 0.9316360 NA 0.4583958 0.7561986 0.7333330 NA
100 0.3636360 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.1666670 0.8181820 0.6959606 NA
101 0.6060607 0.8676667 0.7857140 0.4429824 0.7386720 0.6313629 NA 0.4583958 0.7561986 0.1000000 NA
102 0.1818180 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.001000
103 0.6060607 0.8676667 0.0010000 0.0833330 0.3080500 0.1395920 NA 0.1666670 0.7561986 0.6959606 NA
104 0.6060607 0.8676667 0.5373542 0.4429824 0.5618410 0.8176960 NA 0.6666670 0.7561986 0.3000000 NA
105 0.6060607 0.8676667 0.5373542 0.4166670 0.6069598 0.6313629 NA 0.4583958 0.5454550 0.6959606 0.866667
106 0.6060607 0.8750000 0.5373542 0.4429824 0.0397810 0.6313629 NA 0.4583958 0.7561986 0.9333330 NA
107 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.6313629 NA 0.3333330 0.7561986 0.6959606 NA
108 0.6060607 0.9990000 0.5373542 0.4429824 0.6069598 0.8350150 NA 0.4583958 0.7561986 0.8333330 0.666667
109 0.6060607 0.7500000 0.5373542 0.4166670 0.6069598 0.6313629 0.623528 0.3333330 0.8181820 0.6959606 NA
110 0.6060607 0.8676667 0.5373542 0.6666670 0.6069598 0.8783120 NA 0.4583958 0.7561986 0.6959606 NA
Maybe someone can shed some light on the problem?
回答1:
Ok, so here's the deal... mice
relies on its PredictionMatrix
. This is a matrix that is used to determine from which columns the missing values of each variable are predicted. If a column is empty, then that variable will not be predicted, regardless of what method you specify.
You can check this matrix by running mice
and then typing res$pred
. As you can see, the columns for k11
and k15
are empty and therefore they aren't imputed. Purely as an example (NOT A SOLUTION), try specifying mice(pred = diag(ncol(Sparse_Data)), ...)
. You'll see that now it works. [Edit: For future readers: this is not a way to SOLVE the problem, just to show where the problem is.]
So why does mice
make those two columns empty? Well, I tried looking into the source code of mice
... Within it, there is a function called check.data
. Within that, there is a call to find.collinear
, which in turn will specify which variables are collinear, which will then be removed in subsequent steps.
Are any of your columns collinear? Well, yes:
cor(Sparse_Data, use = "pairwise.complete.obs")
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
k1 1.0000000 1.740412e-01 0.24932705 NA 0.17164319 0.640984131 0.3053596 0.4225772 -0.536055739 -0.50460872 0.97321365
k3 0.1740412 1.000000e+00 -0.42409199 -9.370804e-05 -0.38583663 0.361416106 0.5515156 0.6567106 0.634250161 -0.70631658 0.74001342
k5 0.2493271 -4.240920e-01 1.00000000 4.471829e-01 0.02679894 0.234850334 -0.6624768 0.4201946 -0.924517670 -0.45408744 -0.78628746
k6 NA -9.370804e-05 0.44718290 1.000000e+00 -0.35377747 0.818644775 0.6824749 0.8899878 0.147657537 0.27030472 0.49159991
k7 0.1716432 -3.858366e-01 0.02679894 -3.537775e-01 1.00000000 0.207791538 -0.6406942 -0.2863018 0.898687181 0.14987951 -0.70210859
k8 0.6409841 3.614161e-01 0.23485033 8.186448e-01 0.20779154 1.000000000 0.7491736 0.5219197 0.002468839 -0.13067177 1.00000000
k11 0.3053596 5.515156e-01 -0.66247684 6.824749e-01 -0.64069422 0.749173578 1.0000000 0.5925582 0.830372468 -1.00000000 0.83452358
k12 0.4225772 6.567106e-01 0.42019459 8.899878e-01 -0.28630180 0.521919747 0.5925582 1.0000000 -0.134937885 -0.49251775 0.92582043
k13 -0.5360557 6.342502e-01 -0.92451767 1.476575e-01 0.89868718 0.002468839 0.8303725 -0.1349379 1.000000000 0.29508347 0.13853862
k14 -0.5046087 -7.063166e-01 -0.45408744 2.703047e-01 0.14987951 -0.130671767 -1.0000000 -0.4925177 0.295083470 1.00000000 0.02558161
k15 0.9732137 7.400134e-01 -0.78628746 4.915999e-01 -0.70210859 1.000000000 0.8345236 0.9258204 0.138538625 0.02558161 1.00000000
As you can see, k11
is perfectly correlated with k14
, and k15
with k8
. This is why they get kicked out.
So, there are two solutions... either make sure that there are no perfectly correlated pairs in your matrix, or in this case just provide PredictionMatrix
yourself.
Edit: To further prove my point.. Try running this code before your code and you'll see that it indeed works:
Sparse_Data$k11[1] <- 2
Sparse_Data$k15[1] <- 2
Sparse_Data$k8[1] <- 0.5
Sparse_Data$k14[1] <- 0.5
来源:https://stackoverflow.com/questions/36330570/mice-does-not-impute-certain-columns-but-also-does-not-give-an-error