I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to M
There is likely a problem with your dataset loading or indexing implementation. If you only modified the cost to MSE, make sure pred
and y
are correctly being updated and you did not overwrite them with a different graph operation.
Another thing to help debug would be to predict the actual regression outputs. It would also help if you posted more of your code so we can see your specific data loading implementation, etc.
Short answer:
Transpose your pred
vector using tf.transpose(pred)
.
Longer answer:
The problem is that pred
(the predictions) and y
(the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you'll get a matrix, which is not what you want.
The solution is to transpose the prediction vector using tf.transpose()
to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you'll see that it works even without the fix, because transposing a 1x1 vector is a no-op.
I applied this fix to your example code and observed the following behaviour. Before the fix:
Epoch: 0245 cost= 84.743440580
[*]----------------------------
label value: 23 estimated value: [ 27.47437096]
label value: 50 estimated value: [ 24.71126747]
label value: 22 estimated value: [ 23.87785912]
And after the fix at the same point in time:
Epoch: 0245 cost= 4.181439120
[*]----------------------------
label value: 23 estimated value: [ 21.64333534]
label value: 50 estimated value: [ 48.76105118]
label value: 22 estimated value: [ 24.27996063]
You'll see that the cost is much lower and that it actually learned the value 50 properly. You'll have to do some fine-tuning on the learning rate and such to improve your results of course.