How to compensate if I cant do a large batch size in neural network

问题

I am trying to run an action recognition code from GitHub. The original code used a batch size of 128 with 4 GPUS. I only have two gpus so I cannot match their bacth size number. Is there anyway I can compensate this difference in batch. I saw somewhere that iter_size might compensate according to a formula effective_batchsize= batch_size*iter_size*n_gpu. what is iter_size in this formula? I am using PYthorch not Caffe.

回答1:

In pytorch, when you perform the backward step (calling loss.backward() or similar) the gradients are accumulated in-place. This means that if you call loss.backward() multiple times, the previously calculated gradients are not replaced, but in stead the new gradients get added on to the previous ones. That is why, when using pytorch, it is usually necessary to explicitly zero the gradients between minibatches (by calling optimiser.zero_grad() or similar).

If your batch size is limited, you can simulate a larger batch size by breaking a large batch up into smaller pieces, and only calling optimiser.step() to update the model parameters after all the pieces have been processed.

For example, suppose you are only able to do batches of size 64, but you wish to simulate a batch size of 128. If the original training loop looks like:

optimiser.zero_grad()
loss = model(batch_data) # batch_data is a batch of size 128
loss.backward()
optimiser.step()

then you could change this to:

optimiser.zero_grad()

smaller_batches = batch_data[:64], batch_data[64:128]
for batch in smaller_batches:
    loss = model(batch) / 2
    loss.backward()

optimiser.step()

and the updates to the model parameters would be the same in each case (apart maybe from some small numerical error). Note that you have to rescale the loss to make the update the same.

回答2:

The important concept is not so much the batch size; it's the quantity of epochs you train. Can you double the batch size, giving you the same cluster batch size? If so, that will compensate directly for the problem. If not, double the quantity of iterations, so you're training for the same quantity of epochs. The model will quickly overcome the effects of the early-batch bias.

However, if you are comfortable digging into the training code, myrtlecat gave you an answer that will eliminate the batch-size difference quite nicely.

来源：https://stackoverflow.com/questions/52518324/how-to-compensate-if-i-cant-do-a-large-batch-size-in-neural-network

标签

neural-network

deep-learning

artificial-intelligence

pytorch