Will adding GPU cards automatically scale tensorflow usage?

和自甴很熟 提交于 2019-12-24 02:23:18

问题


Suppose I can train with sample size N, batch size M and network depth L on my GTX 1070 card with tensorflow. Now, suppose I want to train with larger sample 2N and/or deeper network 2L and getting out of memory error.

Will plugging additional GPU cards automatically solve this problem (suppose, that total amount of memory of all GPU cards is sufficient to hold batch and it's gradients)? Or it is impossible with pure tensorflow?

I'v read, that there are bitcoin or etherium miners, that can build mining farm with multiple GPU cards and that this farm will mine faster.

Will mining farm also perform better for deep learning?


回答1:


Will plugging additional GPU cards automatically solve this problem?

No. You have to change your Tensorflow code to explicitly compute different operations on different devices (e.g: compute the gradients over a single batch on every GPU, then send the computed gradients to a coordinator that accumulates the received gradients and updates the model parameters averaging these gradients).

Also, Tensorflow is so flexible that allows you to specify different operations for every different device (or different remote nodes, it's the same). You could do data augmentation on a single computational node and let the others process the data without applying this function. You can execute certain operation on a device or set of devices only.

it is impossible with pure tensorflow?

It's possible with tensorflow, but you have to change the code you wrote for a single train/inference device.

I'v read, that there are bitcoin or etherium miners, that can build mining farm with multiple GPU cards and that this farm will mine faster. Will mining farm also perform better for deep learning?

Blockchains that work using POW (Proof Of Work) requires to solve a difficult problem using a brute-force like approach (they compute a lot's of hash with different inputs until they found a valid hash).

That means that if your single GPU can guess 1000 hash/s, 2 identical GPUs can guess 2 x 1000 hash/s.

The computation the GPUs are doing are completely uncorrelated: the data produced by the GPU:0 is not used by the GPU:1 and there are no synchronization points between the computations. This means that the task that a GPU do can be executed in parallel by another GPU (obviously with different inputs per GPU, so the devices compute hashes to solve different problems given by the network)

Back to Tensorflow: once you modified your code to work with different GPUs, you could train your network faster (in short because you're using bigger batches)



来源:https://stackoverflow.com/questions/45118918/will-adding-gpu-cards-automatically-scale-tensorflow-usage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!