Counting FLOPS/GFLOPS in program - CUDA

问题

Already finished my application which multiplies CRS matrix and vector (SpMV) and the only thing to do now is to count FLOPS my application did. In my opinion it's really hard to estimate number of floating point operation in case of sparse matrix - vector multiplication, because the number of multiplies in one row is really "jumpy" or fluent.

I only tried to measure time using "cudaprof" ( available in ./CUDA/bin directory) - it works fine.

Any sugestions and instruction pastes appreciated !

回答1:

That's not just your opinion; it's simple fact that the number of operations in the case of a sparse matrix is data-dependent, and so you can't get a reasonable answer without knowing something about the data. That makes it impossible to have a one-number-fits-all-data estimate.

This is probably one of the sorts of situations where you could think hard about it for many hours (and do lots of research) to make a maybe-accurate estimate, or you could spend a few minutes writing a variant of your existing implementation that increments a counter each time it does an operation. Sure, that's going to take quite a while to run (especially if you don't do it in a CUDA-enabled form), but probably a lot less time than it would take to do the thinking, and when the answer comes out, you don't have to do a lot of work to convince yourself that it's right.

来源：https://stackoverflow.com/questions/2795336/counting-flops-gflops-in-program-cuda

标签

cuda

nvidia

flops

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!