I\'ve just built a deep learning rig (AMD 12 core threadripper; GeForce RTX 2080 ti; 64Gb RAM). I originally wanted to install CUDnn and CUDA on Ubuntu 19.0, but the install
Another way to analyse the performance of the GPU which I ended up finding (for Windows users) was to go to the "Task Manager" and change one of the Monitors in the "Performance" tab to CUDA, then simply run the script and watch it spike.
Also adding this
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
before the keras import to toggle between CPU and GPU also shows a remarkable difference (although for my simple network, the quicker CPU can be explained here).
You can see the following details here.
Based on the documentation:
If a TensorFlow operation has both CPU and GPU implementations,
by default, the GPU devices will be given priority when the operation is assigned to a device.
For example, tf.matmul has both CPU and GPU kernels.
On a system with devices CPU:0 and GPU:0, the GPU:0 device will be selected to run tf.matmul unless you explicitly request running it on another device.
Logging device placement
tf.debugging.set_log_device_placement(True)
# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
Example Result
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)
For Manual Device placement
tf.debugging.set_log_device_placement(True)
# Place tensors on the CPU
with tf.device('/GPU:0'):
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
Example Result:
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)