When running a TF graph which functions should use CPU and which functions should GPUs when we have Multi GPUs?

前端未结

关注

 1  1907

轮回少年 2021-01-25 14:33

We can assign different devices to do different operations in a Tensorflow Graph with tf.device(\'cpu or gpu\') , It\'s not clear how to divide them . Other thi

1条回答

闹比i (楼主)

2021-01-25 15:28
Finding a device in TF works as follows:
1. Check if there are devices at all
2. sanity-check if nodes manually assigned to devices can really run on these devices
3. prefer consumer nodes as hints for device placement
4. check all constraints to just use valid devices
5. use default device if no other devices is chosen
There is an understandable test: https://github.com/tensorflow/tensorflow/blob/3bc73f5e2ac437b1d9d559751af789c8c965a7f9/tensorflow/core/grappler/costs/virtual_placer_test.cc#L26-L54 which boils down to
```
TEST(VirtualPlacerTest, LocalDevices) {
  // Create a virtual cluster with a local CPU and a local GPU
  std::unordered_map devices;
  devices[".../cpu:0"] = cpu_device;
  devices[".../device:GPU:0"] = gpu_device;

  NodeDef node;
  node.set_op("Conv2D");
  // node.device() is empty, but GPU is default device if there is.
  EXPECT_EQ("GPU", placer.get_device(node).type());

  node.set_device("CPU");
  EXPECT_EQ("CPU", placer.get_device(node).type());

  node.set_device("GPU:0");
  EXPECT_EQ("GPU", placer.get_device(node).type());

}
```
Where do we get the default device? Each device is registered with a priority:
```
void DeviceFactory::Register(const string& device_type, DeviceFactory* factory,int priority)
```
The comment here is interesting and a quick search gives:
- "CPU", ThreadPoolDeviceFactory, 60
- "CPU", GPUCompatibleCPUDeviceFactory, 70
- "GPU", GPUDeviceFactory, 210
The TF-placer uses devices with higher priority if possible. So whenever there is a GPU available and there is a registered kernel of the Op for the GPU and no manual assignment was made => it uses the GPU.

Your second question ("How to divide them") cannot be answered that easily if you care about efficiency. In most cases, there is no need to place the operation on the CPU.

As a rule of thumb: Trust the heuristics behind the scenes, if you feel no need to manually assign devices.

edit: As the questions was edited, here are the additional details:

The soft_device_placement is only applied to nodes, that cannot run on the intended devices. Consider training on the GPU and inference on a laptop. As each Op-Kernel is only registered to a device type (CPU, GPU) it cannot distribute the Op between different GPUs directly (they are the same device type).

There are mainly two ways to do distributed training. And you should care about where to place the variables. I am not sure what you are looking for. But TF allows you to balance the placement over all GPUs.

Please allow me to add one further note: As I only use TensorPack, I know it supports distributed training in a very easy way as illustrated in the distributed ResNet example. So speaking it takes care about all this behind the scene.
0 讨论(0)
发布评论:

提交评论
- 加载中...