Using mkl_set_num_threads with numpy

≯℡__Kan透↙ 提交于 2019-11-29 03:38:48

Ophion led me the right way. Despite the documentation, one have to transfer the parameter of mkl_set_num_thread by reference.

Now I have defined to functions, for getting and setting the threads

import numpy
import ctypes
mkl_rt = ctypes.CDLL('libmkl_rt.so')
mkl_get_max_threads = mkl_rt.mkl_get_max_threads
def mkl_set_num_threads(cores):
    mkl_rt.mkl_set_num_threads(ctypes.byref(ctypes.c_int(cores)))

mkl_set_num_threads(4)
print mkl_get_max_threads() # says 4

and they work as expected.

Edit: according to Rufflewind, the names of the C-Functions are written in capital-case, which expect parameters by value:

import ctypes

mkl_rt = ctypes.CDLL('libmkl_rt.so')
mkl_set_num_threads = mkl_rt.MKL_Set_Num_Threads
mkl_get_max_threads = mkl_rt.MKL_Get_Max_Threads

Long story short, use MKL_Set_Num_Threads and its CamelCased friends when calling MKL from Python. The same applies to C if you don't #include <mkl.h>.


The MKL documentation seems to suggest that the correct type signature in C is:

void mkl_set_num_threads(int nt);

Okay, let's try a minimal program then:

void mkl_set_num_threads(int);
int main(void) {
    mkl_set_num_threads(1);
    return 0;
}

Compile it with GCC and boom, Segmentation fault again. So it seems the problem isn't restricted to Python.

Running it through a debugger (GDB) reveals:

Program received signal SIGSEGV, Segmentation fault.
0x0000… in mkl_set_num_threads_ ()
   from /…/mkl/lib/intel64/libmkl_intel_lp64.so

Wait a second, mkl_set_num_threads_?? That's the Fortran version of mkl_set_num_threads! How did we end up calling the Fortran version? (Keep in mind that Fortran's calling convention requires arguments to be passed as pointers rather than by value.)

It turns out the documentation was a complete façade. If you actually inspect the header files for the recent versions of MKL, you will find this cute little definition:

void    MKL_Set_Num_Threads(int nth);
#define mkl_set_num_threads         MKL_Set_Num_Threads

… and now everything makes sense! The correct function do call (for C code) is MKL_Set_Num_Threads, not mkl_set_num_threads. Inspecting the symbol table reveals that there are actually four different variants defined:

nm -D /…/mkl/lib/intel64/libmkl_rt.so | grep -i mkl_set_num_threads
00000000000e3060 T MKL_SET_NUM_THREADS
…
00000000000e30b0 T MKL_Set_Num_Threads
…
00000000000e3060 T mkl_set_num_threads
00000000000e3060 T mkl_set_num_threads_
…

Why did Intel put in four different variants of one function despite there being only C and Fortran variants in the documentation? I don't know for certain, but I suspect it's for compatibility with different Fortran compilers. You see, Fortran calling convention is not standardized. Different compilers will mangle the names of the functions differently:

  • some use upper case,
  • some use lower case with a trailing underscore, and
  • some use lower case with no decoration at all.

There may even be other ways that I'm not aware of. This trick allows the MKL library to be used with most Fortran compilers without any modification, the downside being that C functions need to be "mangled" to make room for the 3 variants of the Fortran calling convention.

For people looking for the complete solution, you can use a context manager:

import ctypes


class MKLThreads(object):
    _mkl_rt = None

    @classmethod
    def _mkl(cls):
        if cls._mkl_rt is None:
            try:
                cls._mkl_rt = ctypes.CDLL('libmkl_rt.so')
            except OSError:
                cls._mkl_rt = ctypes.CDLL('mkl_rt.dll')
        return cls._mkl_rt

    @classmethod
    def get_max_threads(cls):
        return cls._mkl().mkl_get_max_threads()

    @classmethod
    def set_num_threads(cls, n):
        assert type(n) == int
        cls._mkl().mkl_set_num_threads(ctypes.byref(ctypes.c_int(n)))

    def __init__(self, num_threads):
        self._n = num_threads
        self._saved_n = self.get_max_threads()

    def __enter__(self):
        self.set_num_threads(self._n)
        return self

    def __exit__(self, type, value, traceback):
        self.set_num_threads(self._saved_n)

Then use it like:

with MKLThreads(2):
    # do some stuff on two cores
    pass

Or just manipulating configuration by calling following functions:

# Example
MKLThreads.set_num_threads(3)
print(MKLThreads.get_max_threads())

Code is also available in this gist.

For people looking for a cross platform and packaged solution, note that we have recently released threadpoolctl, a module to limit the number of threads used in C-level threadpools called by python (OpenBLAS, OpenMP and MKL). See this answer for more info.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!