Calculating performance of CUFFT

前端 未结 1 1668
孤城傲影
孤城傲影 2020-12-12 00:46

I am running CUFFT on chunks (N*N/p) divided in multiple GPUs, and I have a question regarding calculating the performance. First, a bit about how I am doing it:

相关标签:
1条回答
  • 2020-12-12 01:04

    If you are doing a complex transform, the operation count is correct (it should be 2.5 N log2(N) for a real valued transform), but the GFLOP formula is incorrect. In a parallel, multiprocessor operation the usual calculation of throughput is

    operation count / wall clock time
    

    In your case, presuming the GPUs are operating in parallel, either measure the wall clock time (ie. how long the whole operation took) for the execution time, or use this:

    execution time = max(memcpyHtoD + kernel + memcpyDtoH times for row and col FFT for each GPU)
    

    As it stands, your calculation represents the serial execution time. Allowing for the overheads from the multigpu scheme, I would expect that the calculated performance numbers you are getting will be lower than the equivalent transform done on a single GPU.

    0 讨论(0)
提交回复
热议问题