问题
I am trying to run some TensorFlow (2.2) example code on databricks/GPU (p2.xlarge) with environment as:
6.6 ML, spark 2.4.5, GPU, Scala 2.11
Keras version : 2.2.5
nvidia-smi
NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2
I have checked https://docs.databricks.com/applications/deep-learning/single-node-training/tensorflow.html#install-tensorflow-22-on-databricks-runtime-66-ml&language-GPU
But, I do not want to run the shell commands every time the databricks GPU clusters is restarted.
so, I installed TensorFlow from databricks libs UI by
tensorflow==2.2.*
I do not indicate it is for GPU or CPU. I assume that it is for GPU by default.
I found that the python3 code is only run on CPUs not on GPU.
import tensorflow as tf
physical_devices = tf.config.list_physical_devices()
physical_devices : [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'), PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]
visible_devices = tf.config.get_visible_devices()
visible devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
tf.test.gpu_device_name() # return empty string
is_built_with_cuda: True
is_built_with_gpu_support: True
is_built_with_rocm: False
is_built_with_xla: True
get_soft_device_placement : True
I am trying to set the 'XLA_GPU' visible to the ML runtime:
# https://www.tensorflow.org/api_docs/python/tf/config/set_visible_devices
# set GPU visible for TF runtime
physical_devices = tf.config.list_physical_devices('XLA_GPU')
try:
# enable first GPU
tf.config.set_visible_devices(physical_devices[0], 'XLA_GPU') # exception here !!!
logical_devices = tf.config.list_logical_devices('XLA_CPU')
# Logical device was created for first GPU
assert len(logical_devices) == len(physical_devices)
except:
# Invalid device or cannot modify virtual devices once initialized.
print('Invalid device or cannot modify virtual devices once initialized.')
But, I got exception.
How to enable GPU so that TF code can run on it ?
thanks
回答1:
Install tensorflow-gpu
instead of tensorflow, as that will run primarily on gpu while tensorflow will run primarily on cpu. You won't need to edit the code as it still imports by the alias tensorflow
来源:https://stackoverflow.com/questions/62489900/how-to-enable-gpu-visible-for-ml-runtime-environment-on-databricks