Synchronizing embedded Python in multi-threaded program

问题

Here is the example of using Python interpreter in multi-threaded program:

#include <python.h>
#include <boost/thread.hpp>

void f(const char* code)
{
    static volatile auto counter = 0;
    for(; counter < 20; ++counter)
    {
        auto state = PyGILState_Ensure();
        PyRun_SimpleString(code);
        PyGILState_Release(state);

        boost::this_thread::yield();
    }
}

int main()
{
    PyEval_InitThreads();
    Py_Initialize();
    PyRun_SimpleString("x = 0\n");
    auto mainstate = PyEval_SaveThread();

    auto thread1 = boost::thread(f, "print('thread #1, x =', x)\nx += 1\n");
    auto thread2 = boost::thread(f, "print('thread #2, x =', x)\nx += 1\n");
    thread1.join();
    thread2.join();

    PyEval_RestoreThread(mainstate);
    Py_Finalize();
}

It looks fine, but it isn't synchronized. Python interpreter releases and reacquires GIL multiple times during PyRun_SimpleString (see docs, p.#2).

We can serialize PyRun_SimpleString call by using our own synchronization object, but it's a wrong way.

Python has its own synchronization modules - _thread and threading. But they don't work in this code:

Py_Initialize();
PyRun_SimpleString(R"(
import _thread
sync = _thread.allocate_lock()

x = 0
)");

auto mainstate = PyEval_SaveThread();

auto thread1 = boost::thread(f, R"(
with sync:
    print('thread #1, x =', x)
    x += 1
)");

it yields an error File "<string>", line 3, in <module> NameError: name '_[1]' is not defined and deadlocks.

How to synchronize embedded python code most efficient way?

回答1:

When CPython calls out to a function that may block (or re-enter Python), it releases the global interpreter lock before calling the function, and then re-acquires the lock after the function returns. In your code, it's your call to the built-in print function that causes the interpreter lock to be released and the other thread to run (see string_print in stringobject.c).

So you need your own lock: the global interpreter lock is not suitable for ensuring serialization of Python code that does I/O.

Since you're using the Boost thread framework, you't probably find it most convenient to use one of the Boost thread synchronization primitives, e.g. boost::interprocess::interprocess_mutex.

[Edited: my original answer was wrong, as pointed out by Abyx.]

回答2:

with statement has issue in Python 3.1, but it was fixed in Python 3.2 and Python 2.7.

So the right solution is to use the threading module for synchronization.

To avoid such issues, one shouldn't use multi-threaded code which uses temporary variables in globals dictionary, or use different globals dictionaries for each thread.

来源：https://stackoverflow.com/questions/4153140/synchronizing-embedded-python-in-multi-threaded-program

标签

c++

python

multithreading