batchsize | 易学教程

Determining max batch size with TensorFlow Object Detection API

阅读更多关于 Determining max batch size with TensorFlow Object Detection API

问题 TF Object Detection API grabs all GPU memory by default, so it's difficult to tell how much I can further increase my batch size. Typically I just continue to increase it until I get a CUDA OOM error. PyTorch on the other hand doesn't grab all GPU memory by default, so it's easy to see what percentage I have left to work with, without all the trial and error. Is there a better way to determine batch size with the TF Object Detection API that I'm missing? Something like an allow-growth flag

keras compute_output_shape not working for Custom layer

阅读更多关于 keras compute_output_shape not working for Custom layer

问题 I customized a layer, merged the batch_size and the first dimension, the other dimensions remained unchanged, but compute_output_shape seemed to have no effect, resulting in the subsequent layer could not get accurate shape information, resulting in an error. How do I make compute_output_shape work? import keras from keras import backend as K class BatchMergeReshape(keras.layers.Layer): def __init__(self, **kwargs): super(BatchMergeReshape, self).__init__(**kwargs) def build(self, input_shape

Does using batch size as 'powers of 2' is faster on tensorflow?

阅读更多关于 Does using batch size as 'powers of 2' is faster on tensorflow?

I read from somewhere that if you choose a batch size that is a power 2, training will be faster. What is this rule? Is this applicable to other applications? Can you provide a reference paper? Algorithmically speaking, using larger mini-batches allows you to reduce the variance of your stochastic gradient updates (by taking the average of the gradients in the mini-batch), and this in turn allows you to take bigger step-sizes, which means the optimization algorithm will make progress faster. However, the amount of work done (in terms of number of gradient computations) to reach a certain

Does using batch size as 'powers of 2' is faster on tensorflow?

阅读更多关于 Does using batch size as 'powers of 2' is faster on tensorflow?

问题 I read from somewhere that if you choose a batch size that is a power 2, training will be faster. What is this rule? Is this applicable to other applications? Can you provide a reference paper? 回答1: Algorithmically speaking, using larger mini-batches allows you to reduce the variance of your stochastic gradient updates (by taking the average of the gradients in the mini-batch), and this in turn allows you to take bigger step-sizes, which means the optimization algorithm will make progress