performance of NumPy with different BLAS implementations

前端未结

关注

 1  1197

I\'m running an algorithm that is implemented in Python and uses NumPy. The most computationally expensive part of the algorithm involves solving a set of linear sys

相关标签:

1条回答

不要未来只要你来

2020-12-10 13:51

The reason for this behavior could be that Accelerate uses multithreading, while the others don't.

Most BLAS implementations follow the environment variable OMP_NUM_THREADS to determine how many threads to use. I believe they only use 1 thread if not told otherwise explicitly. Accelerate's man page, however sounds like threading is turned on by default; it can be turned off by setting the environment variable VECLIB_MAXIMUM_THREADS.

To determine if this is really what's happening, try

export VECLIB_MAXIMUM_THREADS=1

before calling the Accelerate version, and

export OMP_NUM_THREADS=4

for the other versions.

Independent of whether this is really the reason, it's a good idea to always set these variables when you use BLAS to be sure you control what is going on.

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复