This is the message received from running a script to check if Tensorflow is working:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUD
This is the simplest method. Only one step.
It has significant impact on speed. In my case, time taken for a training step almost halved.
Refer custom builds of tensorflow