performance of NumPy with different BLAS implementations

前端 未结 1 1197
鱼传尺愫
鱼传尺愫 2020-12-10 13:45

I\'m running an algorithm that is implemented in Python and uses NumPy. The most computationally expensive part of the algorithm involves solving a set of linear sys

相关标签:
1条回答
  • The reason for this behavior could be that Accelerate uses multithreading, while the others don't.

    Most BLAS implementations follow the environment variable OMP_NUM_THREADS to determine how many threads to use. I believe they only use 1 thread if not told otherwise explicitly. Accelerate's man page, however sounds like threading is turned on by default; it can be turned off by setting the environment variable VECLIB_MAXIMUM_THREADS.

    To determine if this is really what's happening, try

    export VECLIB_MAXIMUM_THREADS=1
    

    before calling the Accelerate version, and

    export OMP_NUM_THREADS=4
    

    for the other versions.

    Independent of whether this is really the reason, it's a good idea to always set these variables when you use BLAS to be sure you control what is going on.

    0 讨论(0)
提交回复
热议问题