问题
I have a somewhat large CUDA application and I need to calculate the attained GFLOPs. I'm looking for an easy and perhaps generic way of counting the number of floating point operations.
Is it possible to count floating point operations from the generated PTX code (as shown below), using a list of predefined fpo in assembly language? Based on the code, can the counting be made generic? For example, does add.s32 %r58, %r8, -2;
count as one floating point operation?
EXAMPLE:
BB3_2:
.loc 2 108 1
mov.u32 %r8, %r79;
setp.ge.s32 %p1, %r78, %r16;
setp.lt.s32 %p2, %r78, 0;
or.pred %p3, %p2, %p1;
@%p3 bra BB3_5;
add.s32 %r58, %r8, -2;
setp.lt.s32 %p4, %r58, 0;
setp.ge.s32 %p5, %r58, %r15;
or.pred %p6, %p4, %p5;
@%p6 bra BB3_5;
.loc 2 112 1
ld.global.u8 %rc1, [%rd17];
cvt.rn.f32.u8 %f11, %rc1;
mul.wide.u32 %rd12, %r80, 4;
add.s64 %rd13, %rd7, %rd12;
ld.local.f32 %f12, [%rd13];
fma.rn.f32 %f14, %f11, %f12, %f14;
.loc 2 113 1
add.f32 %f15, %f15, %f12;
Or are there far simpler ways of counting FPOs and this is a waste of time?
回答1:
The easiest way to count FLOPS would be to have the CUDA profiler do it for you. By selecting the Achieved FLOPS
experiment, you can get charts like this:
The Floating Point Operations
chart displays a count of each type of floating point operation executed by your kernel.
来源:https://stackoverflow.com/questions/14812446/a-method-of-counting-floating-point-operations-in-a-c-cuda-program-using-ptx