I ran a code about the deep learning network,first I trained the network,and it works well,but this error occurs when running to the validate network.
I have five epoch,
It might be for a number of reasons that I try to report in the following list:
biggest_batch_first
description for the BucketIterator in AllenNLP.In addition, I would recommend you to have a look to the official PyTorch documentation: https://pytorch.org/docs/stable/notes/faq.html
The best way is to find the process engaging gpu memory and kill it:
find the PID of python process from:
nvidia-smi
copy the PID and kill it by:
sudo kill -9 pid
1.. When you only perform validation not training,
you don't need to calculate gradients for forward and backward phase.
In that situation, your code can be located under
with torch.no_grad():
...
net=Net()
pred_for_validation=net(input)
...
Above code doesn't use GPU memory
2.. If you use += operator in your code,
it can accumulate gradient continuously in your gradient graph.
In that case, you need to use float() like following site
https://pytorch.org/docs/stable/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory
Even if docs guides with float(), in case of me, item() also worked like
entire_loss=0.0
for i in range(100):
one_loss=loss_function(prediction,label)
entire_loss+=one_loss.item()
3.. If you use for loop in training code,
data can be sustained until entire for loop ends.
So, in that case, you can explicitly delete variables after performing optimizer.step()
for one_epoch in range(100):
...
optimizer.step()
del intermediate_variable1,intermediate_variable2,...
I faced the same issue with my computer. All you have to do is customize your cfg file that suits your computer.Turns out my computer takes image size below 600 X 600 and when I adjusted the same in config file, the program ran smoothly.Picture Describing my cfg file
If someone arrives here because of fast.ai, the batch size of a loader such as ImageDataLoaders
can be controlled via bs=N
where N is the size of the batch.
My dedicated GPU is limited to 2GB of memory, using bs=8
in the following example worked in my situation:
from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'
def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2, seed=42,
label_func=is_cat, item_tfms=Resize(244), num_workers=0, bs=)
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
The error, which you has provided is shown, because you ran out of memory on your GPU. A way to solve it is to reduce the batch size until your code will run without this error.