In C, I have a task where I must do multiplication, inversion, trasposition, addition etc. etc. with huge matrices allocated as 2-dimensional arrays, (arrays of
Loop unrolling does not work if the compiler can't predict the exact amount of iterations of the loop at compile time (or at least predict an upper bound, and then skip as many iterations as needed). This means that if your matrix size is variable, the flag will have no effect.
Now to answer your questions:
a) Does gcc include this kind of optimization with the various optimization flags as -O1, -O2 etc.?
Nope, you have to explicitly set it since it may or may not make the code run faster and it usually makes the executable bigger.
b) Do I have to use any pragmas inside my code to take advantage of loop unrolling or are loops identified automatically?
No pragmas. With -funroll-loops
the compiler heuristically decides which loops to unroll. If you want to force unrolling you can use -funroll-all-loops
, but it usually makes the code run slower.
c) Why is this option not enabled by default if the unrolling increases the performance?
It doesn't always increase performance! Also, not everything is about performance. Some people actually care about having small executables since they have little memory (see: embedded systems)
d) What are the recommended gcc optimization flags to compile the program in the best way possible? (I must run this program optimized for a single CPU family, that is the same of the machine where I compile the code, actually I use march=native and -O2 flags)
There's no silver bullet. You'll need to think, test and see. There is actually a theorem that states that no perfect compiler can ever exist.
Did you profile your program? Profiling is a very useful skill for these things.
Source (mostly): https://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Optimize-Options.html