Say I have a C program that in pseudoish is:
For i=0 to 10
x++
a=2+x*5
next
Is the number of FLOPs for this (1 [x++] + 1 [x*5] + 1 [2+(x+5))] * 10[loop], for 30 FLOPS? I am having trouble understanding what a flop is.
Note the [...] are indicating where I am getting my counts for "operations" from.
For the purposes of FLOPS measurements, usually only additions and multiplications are included. Things like divisions, reciprocals, square roots, and transcendental functions are too expensive to include as a single operation, while things like loads and stores are too trivial.
In other words, your loop body contains 2 adds and 1 multiply, so (assuming x
is floating point) each loop iteration is 3 ops; if you run the loop 10 times you've done 30 ops.
Note that when measuring MIPS, your loop would be more than 3 instructions because it also includes loads and stores that the FLOPS measurement doesn't count.
FLOPS stands for floating operations per second. If you are dealing with integers then you don't have any floating point operations in your code.
The posters have made it clear that FLOPS (detailed here) are concerned with floating point (as opposed to integer) operations per second, so you not only have to count how many operations you're performing, but in what period of time.
If "x" and "a" are floats, you're making a good attempt at counting the number of operations in your code, but you'd have to check the object code to make sure what quantity of floating point instructions are actually used. Eg, if "a" is not subsequently used, an optimizing compiler might not be bothering to compute it.
Also, some floating operations (such as adding) might be much faster than others (such as multiplying), so a loop of only float adds could run at many more FLOPS than a loop of only float multiplies on the same machine.
FLOPs (the lowercase s indicates the plural of FLOP, per Martinho Fernandes comment) are referring to machine language floating point instructions, so it depends how many instructions your code compiles down to.
First off, if all of these variables are integers, then there are no FLOPs in this code. Let's assume, however, that your language recognizes all of these constants and variables as single-precision floating point variables (using single-precision makes loading the constants easier).
This code could compile to (on MIPS):
Assignment of variables: x is in $f1, a is in $f2, i is in $f3.
All other floating point registers are compiler-generated temporaries.
$f4 stores the loop exit condition of 10.0
$f5 stores the floating point constant 1.0
$f6 stores the floating point constant 2.0
$t1 is an integer register used for loading constants
into the floating point coprocessor.
lui $t1, *upper half of 0.0*
ori $t1, $t1, *lower half of 0.0*
lwc1 $f3, $t1
lui $t1, *upper half of 10.0*
ori $t1, $t1, *lower half of 10.0*
lwc1 $f4, $t1
lui $t1, *upper half of 1.0*
ori $t1, $t1, *lower half of 1.0*
lwc1 $f5, $t1
lui $t1, *upper half of 2.0*
ori $t1, $t1, *lower half of 2.0*
lwc1 $f6, $t1
st: c.gt.s $f3, $f4
bc1t end
add.s $f1, $f1, $f5
lui $t1, *upper half of 5.0*
ori $t1, $t1, *lower half of 5.0*
lwc1 $f2, $t1
mul.s $f2, $f2, $f1
add.s $f2, $f2, $f6
add.s $f3, $f3, $f5
j st
end: # first statement after the loop
So according to Gabe's definition, there are 4 FLOPs inside the loop (3x add.s
and 1x mul.s
). There are 5 FLOPs if you also count the loop comparision c.gt.s
. Multiply this by 10 for a total of 40 (or 50) FLOPs used by the program.
A better optimizing compiler might recognize that the value of a
isn't used inside the loop, so it only needs to compute the final value of a
. It could generate code that looks like
lui $t1, *upper half of 0.0*
ori $t1, $t1, *lower half of 0.0*
lwc1 $f3, $t1
lui $t1, *upper half of 10.0*
ori $t1, $t1, *lower half of 10.0*
lwc1 $f4, $t1
lui $t1, *upper half of 1.0*
ori $t1, $t1, *lower half of 1.0*
lwc1 $f5, $t1
lui $t1, *upper half of 2.0*
ori $t1, $t1, *lower half of 2.0*
lwc1 $f6, $t1
st: c.gt.s $f3, $f4
bc1t end
add.s $f1, $f1, $f5
add.s $f3, $f3, $f5
j st
end: lui $t1, *upper half of 5.0*
ori $t1, $t1, *lower half of 5.0*
lwc1 $f2, $t1
mul.s $f2, $f2, $f1
add.s $f2, $f2, $f6
In this case, you have 2 adds and 1 comparision inside the loop (mutiplied by 10 gives you 20 or 30 FLOPs), plus 1 multiplication and 1 addition outside the loop. Thus, your program now takes 22 or 32 FLOPs depending whether we count comparisions.
Is x an integer or a floating-point variable? If it's an integer, then your loop may not contain any flops.
来源:https://stackoverflow.com/questions/3592657/what-counts-as-a-flop