performance of NumPy with different BLAS implementations

∥☆過路亽.° 提交于 2019-11-28 11:01:28

The reason for this behavior could be that Accelerate uses multithreading, while the others don't.

Most BLAS implementations follow the environment variable OMP_NUM_THREADS to determine how many threads to use. I believe they only use 1 thread if not told otherwise explicitly. Accelerate's man page, however sounds like threading is turned on by default; it can be turned off by setting the environment variable VECLIB_MAXIMUM_THREADS.

To determine if this is really what's happening, try

export VECLIB_MAXIMUM_THREADS=1

before calling the Accelerate version, and

export OMP_NUM_THREADS=4

for the other versions.

Independent of whether this is really the reason, it's a good idea to always set these variables when you use BLAS to be sure you control what is going on.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!