batchsize

Determining max batch size with TensorFlow Object Detection API

孤街浪徒 提交于 2021-01-27 19:54:46
问题 TF Object Detection API grabs all GPU memory by default, so it's difficult to tell how much I can further increase my batch size. Typically I just continue to increase it until I get a CUDA OOM error. PyTorch on the other hand doesn't grab all GPU memory by default, so it's easy to see what percentage I have left to work with, without all the trial and error. Is there a better way to determine batch size with the TF Object Detection API that I'm missing? Something like an allow-growth flag

keras compute_output_shape not working for Custom layer

北战南征 提交于 2021-01-27 19:40:35
问题 I customized a layer, merged the batch_size and the first dimension, the other dimensions remained unchanged, but compute_output_shape seemed to have no effect, resulting in the subsequent layer could not get accurate shape information, resulting in an error. How do I make compute_output_shape work? import keras from keras import backend as K class BatchMergeReshape(keras.layers.Layer): def __init__(self, **kwargs): super(BatchMergeReshape, self).__init__(**kwargs) def build(self, input_shape

Does using batch size as 'powers of 2' is faster on tensorflow?

空扰寡人 提交于 2019-12-01 17:27:11
I read from somewhere that if you choose a batch size that is a power 2, training will be faster. What is this rule? Is this applicable to other applications? Can you provide a reference paper? Algorithmically speaking, using larger mini-batches allows you to reduce the variance of your stochastic gradient updates (by taking the average of the gradients in the mini-batch), and this in turn allows you to take bigger step-sizes, which means the optimization algorithm will make progress faster. However, the amount of work done (in terms of number of gradient computations) to reach a certain

Does using batch size as 'powers of 2' is faster on tensorflow?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 16:40:17
问题 I read from somewhere that if you choose a batch size that is a power 2, training will be faster. What is this rule? Is this applicable to other applications? Can you provide a reference paper? 回答1: Algorithmically speaking, using larger mini-batches allows you to reduce the variance of your stochastic gradient updates (by taking the average of the gradients in the mini-batch), and this in turn allows you to take bigger step-sizes, which means the optimization algorithm will make progress