问题
I have simulation program which requires a lot of data. I load the data in the GPUs for calculation and there is a lot of dependency in the data. Since 1 GPU was not enough for the data, so I upgraded it to 2 GPUs. but the limitation was, if I required data on other GPU, there had to be a copy to host first.
So, if I use GPU Direct P2P, will the PCI bus handle that much of to and fro communication between the GPUs? Wont it result in deadlocks?
I am new to this, so need some help and insight.
回答1:
PCI Express has full speed in both directions. There should be no "deadlock" like you may experience in a synchronous MPI communication that needs handshaking before proceeding.
As Robert mentioned in a comment "accessing data over PCIE bus is a lot slower than accessing it from on-board memory". However, it should be significantly faster than transferring data from GPU1 to CPU, then from CPU to GPU2 since you don't have to copy it twice.
You should try to minimize the amount of GPU to GPU transfers, especially if you have to sync data before you do it (could happen in some algorithms). However, you could also try to do some concurrent execution while transferring data. You can look at the Peer-to-Peer memory section of the CUDA C guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#peer-to-peer-memory-copy
来源:https://stackoverflow.com/questions/27832273/gpudirect-peer-2-peer-using-pcie-bus-if-i-need-to-access-too-much-data-on-other