Expected tensorflow model size from learned variables

后端未结

关注

 1  754

When training convolutional neural networks for image classification tasks we generally want our algorithm to learn the filters (and biases) that transform a given image to

相关标签:

1条回答

长发绾君心

2021-01-05 10:24
Adding up all those variables we would expect to get a model.ckpt.data file of size 12.45Mb

Traditionally, most of model parameters are in the first fully connected layer, in this case wd1. Computing only its size yields:
```
7*7*128 * 1024 * 4 = 25690112
```
... or 25.6Mb. Note 4 coefficient, because the variable dtype=tf.float32, i.e. 4 bytes per parameter. Other layers also contribute to the model size, but not so drastically.

As you can see, your estimate 12.45Mb is a bit off (did you use 16bit per param?). The checkpoint also stores some general information, hence the overhead around 25%, which is still big, but not 300%.

[Update]

The model in question actually has FC1 layer of shape [7*7*64, 1024], as was clarified. So the calculated above size should be 12.5Mb, indeed. That made me look into the saved checkpoint more carefully.

After inspecting it, I noticed other big variables that I missed originally:
```
...
Variable_2 (DT_FLOAT) [3136,1024]
Variable_2/Adam (DT_FLOAT) [3136,1024]
Variable_2/Adam_1 (DT_FLOAT) [3136,1024]
...
```
The Variable_2 is exactly wd1, but there are 2 more copies for the Adam optimizer. These variables are created by the Adam optimizer, they're called slots and hold the m and v accumulators for all trainable variables. Now the total size makes sense.

You can run the following code to compute the total size of the graph variables - 37.47Mb:
```
var_sizes = [np.product(list(map(int, v.shape))) * v.dtype.size
             for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
print(sum(var_sizes) / (1024 ** 2), 'MB')
```
So the overhead is actually pretty small. Extra size is due to the optimizer.
0 讨论(0)
发布评论:

提交评论
- 加载中...