could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

前端未结

关注

 19  2399

I installed tensorflow 1.0.1 GPU version on my Macbook Pro with GeForce GT 750M. Also installed CUDA 8.0.71 and cuDNN 5.1. I am running a tf code that works fine with non C

相关标签:

19条回答

野性不改

2020-12-01 16:17

I encountered this problem when I accidently installed the CUDA 9.2 libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb instead of libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb on a system with CUDA 9.0 installed.

I got there because I had CUDA 9.2 installed and I had downgraded to CUDA 9.0, and evidently libcudnn is specific to versions.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-01 16:19
I have managed to get it working by deleting the .nv folder in my home folder:
```
sudo rm -rf ~/.nv/
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-12-01 16:20
In my case, after checking the cuDNN and CUDA version, I found my GPU was out of memory. Using watch -n 0.1 nvidia-smi in another bash terminal, the moment 2019-07-16 19:54:05.122224: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR onset is the moment GPU memory nearly full. The screenshot

So I configure a limit for tnsorflow to use my gpu. As I use tf.keras module, I add the following code to the beginning of my program:
```
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
tf.keras.backend.set_session(tf.Session(config=config));
```
Then, problem solved!

You can change your batch_size or using smarter ways to input your training data (such as tf.data.Dataset and using cache). I hope my answer can help someone else.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-01 16:23
It has to do with the memory fraction available to load GPU resources to create cudnn handle, also known as per_process_gpu_memory_fraction. Reducing this memory fraction by yourself will solve the error.
```
> sess_config = tf.ConfigProto(gpu_options =
> tf.GPUOptions(per_process_gpu_memory_fraction=0.7),
> allow_soft_placement = True)
> 
> with tf.Session(config=sess_config) as sess:
>      sess.run([whatever])
```
Use as small fraction as could fit in your memory. (In the code, I use 0.7, you can start with 0.3 or even smaller, then increase until you get the same error, that's your limit.) Pass it to your tf.Session() or tf.train.MonitoredTrainingSession() or Supervisor's sv.managed_session() as config.

This should allow your GPU create a cudnn handle for your TensorFlow code.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-01 16:24
In my case it seems that the problem was caused by tensorflow and cudnn version mismatch. The following helped me (I was working on Ubuntu 16.04 with NVidia Tesla K80 on Google Cloud, tensorflow 1.5 finally worked with cudnn 7.0.4 and cuda 9.0):
1. Remove cuDNN completely:
```
sudo rm /usr/local/cuda/include/cudnn.h
sudo rm /usr/local/cuda/lib64/libcudnn*
```
  After doing so import tensorflow should cause error.
2. Download appropriate cuDNN version. Note that there is cuDNN 7.0.4 for CUDA 9.0 and cuDNN 7.0.4 for CUDA 8.0. You should choose the one corresponding to your CUDA version. Be careful at this step or you'll get similar problem again. Install cuDNN as usual:
```
tar -xzvf cudnn-9.0-linux-x64-v7.tgz
cd cuda
sudo cp -P include/cudnn.h /usr/include
sudo cp -P lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*
```
  In this example I've installed cuDNN 7.0.x for CUDA 9.0 (x actually doesn't matter). Take care to match your CUDA version.
3. Restart the computer. In my case the problem vanished. If the error still occurs consider installing another version of tensorflow.
Hope this helps someone.
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-01 16:24

Please remember to close your tensorboard terminal/cmd or other terminals, that have interactions to/with the directory. Then you can restart the training at it should work.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 下一页