We can assign different devices to do different operations in a Tensorflow Graph with tf.device(\'cpu or gpu\') , It\'s not clear how to divide them . Other thi
Finding a device in TF works as follows:
There is an understandable test: https://github.com/tensorflow/tensorflow/blob/3bc73f5e2ac437b1d9d559751af789c8c965a7f9/tensorflow/core/grappler/costs/virtual_placer_test.cc#L26-L54 which boils down to
TEST(VirtualPlacerTest, LocalDevices) {
// Create a virtual cluster with a local CPU and a local GPU
std::unordered_map devices;
devices[".../cpu:0"] = cpu_device;
devices[".../device:GPU:0"] = gpu_device;
NodeDef node;
node.set_op("Conv2D");
// node.device() is empty, but GPU is default device if there is.
EXPECT_EQ("GPU", placer.get_device(node).type());
node.set_device("CPU");
EXPECT_EQ("CPU", placer.get_device(node).type());
node.set_device("GPU:0");
EXPECT_EQ("GPU", placer.get_device(node).type());
}
Where do we get the default device? Each device is registered with a priority:
void DeviceFactory::Register(const string& device_type, DeviceFactory* factory,int priority)
The comment here is interesting and a quick search gives:
The TF-placer uses devices with higher priority if possible. So whenever there is a GPU available and there is a registered kernel of the Op for the GPU and no manual assignment was made => it uses the GPU.
Your second question ("How to divide them") cannot be answered that easily if you care about efficiency. In most cases, there is no need to place the operation on the CPU.
As a rule of thumb: Trust the heuristics behind the scenes, if you feel no need to manually assign devices.
edit: As the questions was edited, here are the additional details:
The soft_device_placement
is only applied to nodes, that cannot run on the intended devices. Consider training on the GPU and inference on a laptop. As each Op-Kernel is only registered to a device type (CPU, GPU) it cannot distribute the Op between different GPUs directly (they are the same device type).
There are mainly two ways to do distributed training. And you should care about where to place the variables. I am not sure what you are looking for. But TF allows you to balance the placement over all GPUs.
Please allow me to add one further note: As I only use TensorPack, I know it supports distributed training in a very easy way as illustrated in the distributed ResNet example. So speaking it takes care about all this behind the scene.