问题
I've implemented a home-brewed ZFNet (prototxt) for my research. After 20k iterations with the definition, the test accuracy stays at ~0.001 (i.e., 1/1000), the test loss at ~6.9, and training loss at ~6.9, which seems that the net keeps playing guessing games among the 1k classes. I've thoroughly checked the whole definition and tried to change some of the hyper-parameters to start a new training, but of no avail, same results' shown on the screen....
Could anyone show me some light? Thanks in advance!
The hyper-parameters in the prototxt are derived from the paper [1]. All the inputs and outputs of the layers seems correct as Fig. 3 in the paper suggests.
The tweaks are:
crop
-s of the input for both training and testing are set to225
instead of224
as discussed in #33;one-pixel zero paddings for
conv3
,conv4
, andconv5
to make the sizes of the blobs consistent [1];filler types for all learnable layers changed from
constant
in [1] togaussian
withstd: 0.01
;weight_decay
: changing from0.0005
to0.00025
as suggested by @sergeyk in PR #33;
[1] Zeiler, M. and Fergus, R. Visualizing and Understanding Convolutional Networks, ECCV 2014.
and for the poor part..., I pasted it here
回答1:
A few suggestions:
- Change initialization from
gauss
toxavier
. - Work with "PReLU" acitvations, instead of
"ReLU"
. once your net converges you can finetune to remove them. - Try reducing
base_lr
by an order of magnitude (or even two orders).
来源:https://stackoverflow.com/questions/39663506/test-accuracy-cannot-improve-when-learning-zfnet-on-ilsvrc12