I've already solved my problem by moving the import to the top declarations, but it left me wondering: Why cant I use a module that was imported in '__main__'
in functions that are the targets of multiprocessing
?
For example:
import os
import multiprocessing as mp
def run(in_file, out_dir, out_q):
arcpy.RaterToPolygon_conversion(in_file, out_dir, "NO_SIMPIFY", "Value")
status = str("Done with "+os.path.basename(in_file))
out_q.put(status, block=False)
if __name__ == '__main__':
raw_input("Program may hang, press Enter to import ArcPy...")
import arcpy
q = mp.Queue()
_file = path/to/file
_dir = path/to/dir
# There are actually lots of files in a loop to build
# processes but I just do one for context here
p = mp.Process(target=run, args=(_file, _dir, q))
p.start()
# I do stuff with Queue below to status user
When you run this in IDLE it doesn't error at all...just keeps doing a Queue
check (which is good so not the problem). The problem is that when you run this in the CMD terminal (either OS or Python) it produces the error that arcpy
is not defined!
Just a curious topic.
The situation is different in unix-like systems and Windows. On the unixy systems, multiprocessing
uses fork
to create child processes that share a copy-on-write view of the parent memory space. The child sees the imports from the parent, including anything the parent imported under if __name__ == "__main__":
.
On windows, there is no fork, a new process has to be executed. But simply rerunning the parent process doesn't work - it would run the whole program again. Instead, multiprocessing
runs its own python program that imports the parent main script and then pickles/unpickles a view of the parent object space that is, hopefully, sufficient for the child process.
That program is the __main__
for the child process and the __main__
of the parent script doesn't run. The main script was just imported like any other module. The reason is simple: running the parent __main__
would just run the full parent program again, which mp
must avoid.
Here is a test to show what is going on. A main module called testmp.py
and a second module test2.py
that is imported by the first.
testmp.py
import os
import multiprocessing as mp
print("importing test2")
import test2
def worker():
print('worker pid: {}, module name: {}, file name: {}'.format(os.getpid(),
__name__, __file__))
if __name__ == "__main__":
print('main pid: {}, module name: {}, file name: {}'.format(os.getpid(),
__name__, __file__))
print("running process")
proc = mp.Process(target=worker)
proc.start()
proc.join()
test2.py
import os
print('test2 pid: {}, module name: {}, file name: {}'.format(os.getpid(),
__name__, __file__))
When run on Linux, test2 is imported once and the worker runs in the main module.
importing test2
test2 pid: 17840, module name: test2, file name: /media/td/USB20FD/tmp/test2.py
main pid: 17840, module name: __main__, file name: testmp.py
running process
worker pid: 17841, module name: __main__, file name: testmp.py
Under windows, notice that "importing test2" is printed twice - testmp.py was run two times. But "main pid" was only printed once - its __main__
wasn't run. That's because multiprocessing
changed the module name to __mp_main__
during import.
E:\tmp>py testmp.py
importing test2
test2 pid: 7536, module name: test2, file name: E:\tmp\test2.py
main pid: 7536, module name: __main__, file name: testmp.py
running process
importing test2
test2 pid: 7544, module name: test2, file name: E:\tmp\test2.py
worker pid: 7544, module name: __mp_main__, file name: E:\tmp\testmp.py
来源:https://stackoverflow.com/questions/43545179/why-does-importing-module-in-main-not-allow-multiprocessig-to-use-module