Using mkl_set_num_threads with numpy

I'm trying to set the number of threads for numpy calculations with mkl_set_num_threads like this

import numpy
import ctypes
mkl_rt = ctypes.CDLL('libmkl_rt.so')
mkl_rt.mkl_set_num_threads(4)

but I keep getting an segmentation fault:

Program received signal SIGSEGV, Segmentation fault.
0x00002aaab34d7561 in mkl_set_num_threads__ () from /../libmkl_intel_lp64.so

Getting the number of threads is no problem:

print mkl_rt.mkl_get_max_threads()

How can I get my code working? Or is there another way to set the number of threads at runtime?

Ophion led me the right way. Despite the documentation, one have to transfer the parameter of mkl_set_num_thread by reference.

Now I have defined to functions, for getting and setting the threads

import numpy
import ctypes
mkl_rt = ctypes.CDLL('libmkl_rt.so')
mkl_get_max_threads = mkl_rt.mkl_get_max_threads
def mkl_set_num_threads(cores):
    mkl_rt.mkl_set_num_threads(ctypes.byref(ctypes.c_int(cores)))

mkl_set_num_threads(4)
print mkl_get_max_threads() # says 4

and they work as expected.

Edit: according to Rufflewind, the names of the C-Functions are written in capital-case, which expect parameters by value:

import ctypes

mkl_rt = ctypes.CDLL('libmkl_rt.so')
mkl_set_num_threads = mkl_rt.MKL_Set_Num_Threads
mkl_get_max_threads = mkl_rt.MKL_Get_Max_Threads

Long story short, use MKL_Set_Num_Threads and its CamelCased friends when calling MKL from Python. The same applies to C if you don't #include <mkl.h>.

The MKL documentation seems to suggest that the correct type signature in C is:

void mkl_set_num_threads(int nt);

Okay, let's try a minimal program then:

void mkl_set_num_threads(int);
int main(void) {
    mkl_set_num_threads(1);
    return 0;
}

Compile it with GCC and boom, Segmentation fault again. So it seems the problem isn't restricted to Python.

Running it through a debugger (GDB) reveals:

Program received signal SIGSEGV, Segmentation fault.
0x0000… in mkl_set_num_threads_ ()
   from /…/mkl/lib/intel64/libmkl_intel_lp64.so

Wait a second, mkl_set_num_threads_?? That's the Fortran version of mkl_set_num_threads! How did we end up calling the Fortran version? (Keep in mind that Fortran's calling convention requires arguments to be passed as pointers rather than by value.)

It turns out the documentation was a complete façade. If you actually inspect the header files for the recent versions of MKL, you will find this cute little definition:

void    MKL_Set_Num_Threads(int nth);
#define mkl_set_num_threads         MKL_Set_Num_Threads

… and now everything makes sense! The correct function do call (for C code) is MKL_Set_Num_Threads, not mkl_set_num_threads. Inspecting the symbol table reveals that there are actually four different variants defined:

nm -D /…/mkl/lib/intel64/libmkl_rt.so | grep -i mkl_set_num_threads
00000000000e3060 T MKL_SET_NUM_THREADS
…
00000000000e30b0 T MKL_Set_Num_Threads
…
00000000000e3060 T mkl_set_num_threads
00000000000e3060 T mkl_set_num_threads_
…

Why did Intel put in four different variants of one function despite there being only C and Fortran variants in the documentation? I don't know for certain, but I suspect it's for compatibility with different Fortran compilers. You see, Fortran calling convention is not standardized. Different compilers will mangle the names of the functions differently:

some use upper case,
some use lower case with a trailing underscore, and
some use lower case with no decoration at all.

There may even be other ways that I'm not aware of. This trick allows the MKL library to be used with most Fortran compilers without any modification, the downside being that C functions need to be "mangled" to make room for the 3 variants of the Fortran calling convention.

For people looking for the complete solution, you can use a context manager:

import ctypes


class MKLThreads(object):
    _mkl_rt = None

    @classmethod
    def _mkl(cls):
        if cls._mkl_rt is None:
            try:
                cls._mkl_rt = ctypes.CDLL('libmkl_rt.so')
            except OSError:
                cls._mkl_rt = ctypes.CDLL('mkl_rt.dll')
        return cls._mkl_rt

    @classmethod
    def get_max_threads(cls):
        return cls._mkl().mkl_get_max_threads()

    @classmethod
    def set_num_threads(cls, n):
        assert type(n) == int
        cls._mkl().mkl_set_num_threads(ctypes.byref(ctypes.c_int(n)))

    def __init__(self, num_threads):
        self._n = num_threads
        self._saved_n = self.get_max_threads()

    def __enter__(self):
        self.set_num_threads(self._n)
        return self

    def __exit__(self, type, value, traceback):
        self.set_num_threads(self._saved_n)

Then use it like:

with MKLThreads(2):
    # do some stuff on two cores
    pass

Or just manipulating configuration by calling following functions:

# Example
MKLThreads.set_num_threads(3)
print(MKLThreads.get_max_threads())

Code is also available in this gist.

For people looking for a cross platform and packaged solution, note that we have recently released threadpoolctl, a module to limit the number of threads used in C-level threadpools called by python (OpenBLAS, OpenMP and MKL). See this answer for more info.

来源：https://stackoverflow.com/questions/28283112/using-mkl-set-num-threads-with-numpy

标签

python

numpy

intel-mkl