How to fix this strange error: “RuntimeError: CUDA error: out of memory”

后端未结

关注

 6  672

时光取名叫无心 2021-02-12 22:59

I ran a code about the deep learning network,first I trained the network,and it works well,but this error occurs when running to the validate network.

I have five epoch,

6条回答

轻奢々 (楼主)

2021-02-12 23:41
It might be for a number of reasons that I try to report in the following list:
1. Modules parameters: check the number of dimensions for your modules. Linear layers that transform a big input tensor (e.g., size 1000) in another big output tensor (e.g., size 1000) will require a matrix whose size is (1000, 1000).
2. RNN decoder maximum steps: if you're using an RNN decoder in your architecture, avoid looping for a big number of steps. Usually, you fix a given number of decoding steps that is reasonable for your dataset.
3. Tensors usage: minimise the number of tensors that you create. The garbage collector won't release them until they go out of scope.
4. Batch size: incrementally increase your batch size until you go out of memory. It's a common trick that even famous library implement (see the biggest_batch_first description for the BucketIterator in AllenNLP.
In addition, I would recommend you to have a look to the official PyTorch documentation: https://pytorch.org/docs/stable/notes/faq.html
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...