Poor performance due to hyper-threading with OpenMP: how to bind threads to cores

后端 未结 1 1087
长情又很酷
长情又很酷 2021-02-07 08:27

I am developing large dense matrix multiplication code. When I profile the code it sometimes gets about 75% of the peak flops of my four core system and other times gets about

相关标签:
1条回答
  • 2021-02-07 09:19

    This isn't a direct answer to your question, but it might be worth looking in to: apparently, hyperthreading can cause your cache to thrash. Have you tried checking out valgrind to see what kind of issue is causing your problem? There might be a quick fix to be had from allocating some junk at the top of every thread's stack so that your threads don't end up kicking each others cache lines out.

    It looks like your CPU is 4-way set associative so it's not insane to think that, across 8 threads, you might end up with some really unfortunately aligned accesses. If your matrices are aligned on a multiple of the size of your cache, and if you had pairs of threads accessing areas a cache-multiple apart, any incidental read by a third thread would be enough to start causing conflict misses.

    For a quick test -- if you change your input matrices to something that's not a multiple of your cache size (so they're no longer aligned on a boundary) and your problems disappear, then there's a good chance that you're dealing with conflict misses.

    0 讨论(0)
提交回复
热议问题