For a current project I\'m trying to learn the basics of OpenMp offloading. I tried to use cache blocking to speed things up. But my program runs 10x slower in comparison to