I\'m trying to implement the code shown in this pdf. More precisely (page 50):
#define SM (CLS / sizeof (double)) for (i = 0; i < N; i += SM) for (j = 0