问题
Is there an easy way to count the number of multiplications actually executed by a piece of standard C code? The code I have in mind basically just does additions and multiplications, and it's the multiplications that are of primary interest, but it wouldn't hurt to get counts of the other operations as well.
If it were an option, I suppose I could go around replacing 'a * b' with 'multiply(a, b)' and write a cover function for the native * operator, b/c I really don't care about time performance during this test, but the primary objection to doing that is having to re-work a pile of source code just to run the test.
I have no objection to re-compiling the source, perhaps against some library or with obscure (afaik) options. Valgrind came to mind, but if I understand valgrind's purpose, that's more about tracing values than counting operations.
回答1:
If your compiler supports soft-float (i.e. using functions with integer implementations to emulate floating-point), you could compiler your program in that mode (-msoft-float
in GCC), and use your favorite profiling tool to measure how many times they are invoked.
Many processors also have performance counters that can count the number of floating-point operations that have been retired. Depending on the hardware and OS, you may or may not need some amount of kernel support to take advantage of them.
回答2:
Compile the source code into assembly language and then search for the multiply instructions.
Note that the optimization level can greatly affect the number that appear. For loops, you would have to determine the scope of multiplies within a loop and factor that into the result, but if the code is fairly constrained or limited in extent, that should be straightforward.
回答3:
Note: a shameless extrapolation of my comment for as much rep as I can skim.
PAPI has two high-level API functions called PAPI_flips and PAPI_flops which can be used to record the FLOPS as well as the number of floating point operations. Additionally, PAPI offers lots of other performance counter monitoring capability, depending on your processor architecture... cache, bus, memory, branches, etc. I think there is support or support is emerging for graphics accelerators and CUDA/GPGPU.
PAPI will need to be installed on your system, but I think it's widespread enough that installation wouldn't be too painful, if you know what you're doing.
The nice thing about PAPI is that you don't need to know anything about the code; just instrument it (the interface is the same as a stopwatch for FLOPS) and run it. It's based on the actual dynamic execution of your program, so it takes into account things that are hard to account for analytically, such as (pseudo-)random behavior, user/variable input, and related branches.
回答4:
The best that I can think of is (assuming you're running gdb):
If you could identify the points were multiplications are occurring, you could then set tracepoints just prior to the multiplication (or perhaps just after them depending on the details), then run the program and count the number of tracepoint dumps.
Yes, it is very crude. Certainly there are other solutions; however, I would hesitate to trash my stack for something as simple as a count.
来源:https://stackoverflow.com/questions/7854890/profiling-floating-point-usage-in-c