When training convolutional neural networks for image classification tasks we generally want our algorithm to learn the filters (and biases) that transform a given image to
Adding up all those variables we would expect to get a model.ckpt.data file of size 12.45Mb
Traditionally, most of model parameters are in the first fully connected layer, in this case wd1
. Computing only its size yields:
7*7*128 * 1024 * 4 = 25690112
... or 25.6Mb
. Note 4
coefficient, because the variable dtype=tf.float32
, i.e. 4
bytes per parameter. Other layers also contribute to the model size, but not so drastically.
As you can see, your estimate 12.45Mb
is a bit off (did you use 16bit per param?). The checkpoint also stores some general information, hence the overhead around 25%, which is still big, but not 300%.
[Update]
The model in question actually has FC1 layer of shape [7*7*64, 1024]
, as was clarified. So the calculated above size should be 12.5Mb
, indeed. That made me look into the saved checkpoint more carefully.
After inspecting it, I noticed other big variables that I missed originally:
...
Variable_2 (DT_FLOAT) [3136,1024]
Variable_2/Adam (DT_FLOAT) [3136,1024]
Variable_2/Adam_1 (DT_FLOAT) [3136,1024]
...
The Variable_2
is exactly wd1
, but there are 2 more copies for the Adam optimizer. These variables are created by the Adam optimizer, they're called slots and hold the m
and v
accumulators for all trainable variables. Now the total size makes sense.
You can run the following code to compute the total size of the graph variables - 37.47Mb
:
var_sizes = [np.product(list(map(int, v.shape))) * v.dtype.size
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
print(sum(var_sizes) / (1024 ** 2), 'MB')
So the overhead is actually pretty small. Extra size is due to the optimizer.