could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

前端 未结 19 2400
故里飘歌
故里飘歌 2020-12-01 16:01

I installed tensorflow 1.0.1 GPU version on my Macbook Pro with GeForce GT 750M. Also installed CUDA 8.0.71 and cuDNN 5.1. I am running a tf code that works fine with non C

相关标签:
19条回答
  • 2020-12-01 16:33

    This is cudnn compatible issue. Check what you installed that is using the GPU for instance, tensorflow-gpu. What is the version? Is the version compatible with the version of your cudnn and is the cudnn installed the right version for your cuda?.

    I have observed that: cuDNN v7.0.3 for Cuda 7.* cuDNN v7.1.2 for Cuda 9.0 cuDNN v7.3.1 for Cuda 9.1 and so on.

    So also check the correct version of TensorFlow for your cuda configurations. For instance -using tensorflow-gpu: TF v1.4 for cudnn 7.0.* TF v1.7 and above for cudnn 9.0.*, etc.

    So all you need to do is to reinstall the appropriate cudnn version. Hope it helps!

    0 讨论(0)
  • 2020-12-01 16:34

    In Tensorflow 2.0, my issue was resolved by setting the memory growth. ConfigProto is deprecated in TF 2.0, I used tf.config.experimental. My computer specs are:

    • OS: Ubuntu 18.04
    • GPU: GeForce RTX 2070
    • Nvidia Driver: 430.26
    • Tensorflow: 2.0
    • Cudnn: 7.6.2
    • Cuda: 10.0

    The code I used was:

    physical_devices = tf.config.experimental.list_physical_devices('GPU')
    assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
    config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
    
    0 讨论(0)
  • 2020-12-01 16:35

    To me, the 4th can nicely solve the problem. https://blog.csdn.net/comway_Li/article/details/102953634?utm_medium=distribute.pc_relevant.none-task-blog-baidujs-2

    1.
        config = tf.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = 1.0
        session = tf.Session(config=config, ...)
    
    2.
        config = tf.ConfigProto() 
        config.gpu_options.allow_growth = True 
        sess = tf.Session(config=config)
    
    3.
        sudo rm -f ~/.nv 
    
    4.
        from tensorflow.compat.v1 import ConfigProto
        from tensorflow.compat.v1 import InteractiveSession
        #from tensorflow import ConfigProto
        #from tensorflow import InteractiveSession
        config = ConfigProto()
        config.gpu_options.allow_growth = True
        session = InteractiveSession(config=config)
    
    0 讨论(0)
  • 2020-12-01 16:35

    Rebooting the machine worked for me. Try this:

    sudo reboot
    

    Then, re-run the code

    0 讨论(0)
  • 2020-12-01 16:38

    I too encountered the same problem:

    Using TensorFlow backend.
    I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
    I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
    I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
    I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
    I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
    I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
    name: GeForce GTX 1050
    major: 6 minor: 1 memoryClockRate (GHz) 1.493 pciBusID 0000:01:00.0
    Total memory: 3.95GiB
    Free memory: 3.60GiB
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
    E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
    E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
    F tensorflow/core/kernels/conv_ops.cc:532] Check failed:  stream->parent()->GetConvolveAlgorithms(&algorithms)
    
    Aborted (core dumped)
    

    But in my case using sudo with the command worked perfectly fine.

    0 讨论(0)
  • 2020-12-01 16:41

    I ran into the same problem because my GPU was running out of memory by some background zombie/terminated process, killing those processes works for me:

    ps aux | grep 'Z' # Zombie
    ps aux | grep 'T' # Terminated
    kill -9 your_zombie_or_terminated_process_id
    
    0 讨论(0)
提交回复
热议问题