How do I explain performance variability over PCIe bus?

前端 未结 1 1129
无人及你
无人及你 2021-01-25 02:08

On my CUDA program I see large variability between different runs (upto 50%) in communication time which include host to device and device to host data transfer times over PCI E

相关标签:
1条回答
  • 2021-01-25 02:19

    The system you are doing this on is a NUMA system, which means that each of the two discrete CPUs (the Opteron 6168 has two 6 core CPUs in a single package) in your host has its own memory controller and there maybe a different number of HyperTransport hops between each CPUs memory and the PCI-e controller hosting your CUDA device.

    This means that, depending on CPU affinity, the thread which runs your bandwidth tests may have different latency to both host memory and the GPU. This would explain the differences in timings which you are seeing

    0 讨论(0)
提交回复
热议问题