How to fix this strange error: “RuntimeError: CUDA error: out of memory”

后端 未结 6 672
时光取名叫无心
时光取名叫无心 2021-02-12 22:59

I ran a code about the deep learning network,first I trained the network,and it works well,but this error occurs when running to the validate network.

I have five epoch,

6条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-12 23:41

    It might be for a number of reasons that I try to report in the following list:

    1. Modules parameters: check the number of dimensions for your modules. Linear layers that transform a big input tensor (e.g., size 1000) in another big output tensor (e.g., size 1000) will require a matrix whose size is (1000, 1000).
    2. RNN decoder maximum steps: if you're using an RNN decoder in your architecture, avoid looping for a big number of steps. Usually, you fix a given number of decoding steps that is reasonable for your dataset.
    3. Tensors usage: minimise the number of tensors that you create. The garbage collector won't release them until they go out of scope.
    4. Batch size: incrementally increase your batch size until you go out of memory. It's a common trick that even famous library implement (see the biggest_batch_first description for the BucketIterator in AllenNLP.

    In addition, I would recommend you to have a look to the official PyTorch documentation: https://pytorch.org/docs/stable/notes/faq.html

提交回复
热议问题