问题
I am getting the above error (Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR) when I execute the code below. I have cheked if my gpu is woking using tf.test.is_gpu_available
# coding: utf-8
import tensorflow as tf
import numpy as np
import keras
from models import *
import os
import gc
TF_FORCE_GPU_ALLOW_GROWTH = True
np.random.seed(1000)
#Paths
MODEL_CONF = "../models/conf/"
MODEL_WEIGHTS = "../models/weights/"
#Model informations
N_CLASSES = 3
def load_array(name):
return np.load(name, allow_pickle = True)
gc.collect()
dirData = "saved_data/"
trainDir = dirData + "train/"
model = AdaptedLeNet((168, 168, 8), N_CLASSES)
model.summary(print_fn=lambda x: print(x + '\n'))
# Compile the model with the specified loss function.
model.compile(optimizer=keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'])
for filename in os.listdir(trainDir):
data = load_array(trainDir + filename)
train = data["a"]
labels = data["b"].astype(int).reshape(-1)
one_hot_targets = np.eye(N_CLASSES)[labels]
model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)
gc.collect()
The output of this code is:
Epoch 1/5
2020-04-03 18:50:43.397010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-03 18:50:43.608330: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-03 18:50:44.274270: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275686: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275747: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
Traceback (most recent call last):
File "cnnAlert.py", line 62, in <module>
model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3727, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
return self._call_impl(args, kwargs)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv2d_1/convolution (defined at /home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_2350]
Function call stack:
keras_scratch_graph
Some more informations:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1660 Off | 00000000:01:00.0 On | N/A |
| 27% 41C P8 9W / 120W | 211MiB / 5911MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 989 G /usr/lib/xorg/Xorg 78MiB |
| 0 1438 G cinnamon 31MiB |
| 0 8622 G ...uest-channel-token=16736224539216711033 99MiB |
+-----------------------------------------------------------------------------+
3
How do I solve this error? Can you help me?
EDIT 1
- CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
- Host compiler version : GCC 7.5.0
- Tensorflow: 2.1.0-rc0;
- CUDNN lib is in my LD_LIBRARY_PATH
回答1:
You might need to set the tensorflow session config.gpu_option.allow_growth to true, which can be done by adding the following to the top of your code:
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)
回答2:
There is an answer on a question about TF1.0 which addresses how to do this for TF2. The suggestion from that answer worked for me, so I'll copy it in here. TF2 seems to be moving away from tf.Session
, so I tend to prefer this suggestion to the other answer here.
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
来源:https://stackoverflow.com/questions/61021287/tf-2-could-not-create-cudnn-handle-cudnn-status-internal-error