Why multiprocessing.Process behave differently on windows and linux for global object and function arguments

后端 未结 2 1020
一整个雨季
一整个雨季 2020-12-03 08:15

The following code has different output when running on windows and linux (both with python2.7)

\'\'\'import_mock.py\'\'\'
to_mock = None
相关标签:
2条回答
  • 2020-12-03 08:31

    Adding to @Blckknght's answer: on Windows, each process imports the original module "from scratch", while on Unix-y systems only the main process runs the whole module, while all other processes see whatever exists at the time fork() is used to create the new processes (no, you're not calling fork() yourself - multiprocessing internals call it whenever it creates a new process).

    In detail, for your import_mock:

    • On all platforms, the main process calls func(), which sets import_mock.to_mock to 1.

    • On Unix-y platforms, that's what all new processes see: the fork() occurs after that, so 1 is the state all new processes inherit.

    • On Windows, all new processes run the entire module "from scratch". So they each import their own, brand new version of import_mock. Only the main process calls func(), so only the main process sees to_mock change to 1. All other processes see the fresh None state.

    That's all expected, and actually easy to understand the second time ;-)

    What's going on with passing a is subtler, because it depends more on multiprocessing implementation details. The implementation could have chosen to pickle arguments on all platforms from the start, but it didn't, and now it's too late to change without breaking stuff on some platforms.

    Because of copy-on-write fork() semantics, it wasn't necessary to pickle Process() arguments on Unix-y systems, and so the implementation never did. However, without fork() it is necessary to pickle them on Windows - and so the implementation does.

    Before Python 3.4, which allows you to force "the Windows implementation" (spawn) on all platforms, there's no mechanical way to avoid possible cross-platform surprises.

    But in practice, I've rarely been bothered by this. Knowing that, for example, multiprocessing can depend heavily on pickling, I stay completely clear of getting anywhere near playing tricks with pickles. The only reason you had "a problem" passing an A() instance is that you are playing pickle tricks (via overriding the default __getstate__()).

    0 讨论(0)
  • 2020-12-03 08:44

    On Linux (and other Unix-like OSs), Python's multiprocessing module using fork() to create new child processes that efficiently inherit a copy of the parent process's memory state. That means the interpreter doesn't need to pickle the objects that are being passed as the Process's args since the child process will already have them available in their normal form.

    Windows doesn't have a fork() system call however, so the multiprocessing module needs to do a bit more work to make the child-spawning process work. The fork()-based implementation came first, and the non-forking Windows implementation came later.

    It's worth noting that the Python developers had often felt it was a bit of a misfeature for the creation of child processes to differ so much based on the platform you're running Python on. So in Python 3.4, a new system was added to allow you to select the start method that you would prefer to use. The options are "fork", "forkserver" and "spawn". The "fork" method remains the default on Unix-like systems (where it was the only implementation in earlier versions of Python). The "spawn" method is the default (and only) option on Windows, but now can be used on Unix-like systems too. The "forkserver" method is sort of a hybrid between the two (and only available on some Unix-like systems). You can read more about the differences between the methods in the documentation.

    0 讨论(0)
提交回复
热议问题