I understand that threads in Python use the same instance of Python interpreter. My question is it the same with process created by os.fork
? Or does each process cr
os.fork()
is equivalent to the fork()
syscall in many UNIC(es). So yes your sub-process(es) will be separate from the parent and have a different interpreter (as such).
man fork:
FORK(2)
NAME fork - create a child process
SYNOPSIS #include
pid_t fork(void);
DESCRIPTION fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent, except for the following points:
pydoc os.fork():
os.fork()
Fork a child process. Return 0 in the child and the child’s process id in the parent. If an error occurs OSError is raised.Note that some platforms including FreeBSD <= 6.3, Cygwin and OS/2 EMX have known issues when using fork() from a thread.
See also: Martin Konecny's response as to the why's and advantages of "forking" :)
For brevity; other approaches to concurrency which don't involve a separate process and therefore a separate Python interpreter include:
While fork
does indeed create a copy of the current Python interpreter rather than running with the same one, it usually isn't what you want, at least not on its own. Among other problems:
fork
after initialization that isn't true. Most infamously, if you let ssl
seed its PRNG in the main process, then fork, you now have potentially predictable random numbers, which is a big hole in your security.fork
and an exec
. If you never call exec
, you can only use those syscalls. Which basically means you can't do anything portably.fork
.See POSIX fork or your platform's manpage for details on these issues.
The right answer is almost always to use multiprocessing, or concurrent.futures (which wraps up multiprocessing
), or a similar third-party library.
With 3.4+, you can even specify a start method. The fork
method basically just calls fork
. The forkserver
method runs a single "clean" process (no threads, signal handlers, SSL initialization, etc.) and forks off new children from that. The spawn
method calls fork
then exec
, or an equivalent like posix_spawn
, to get you a brand-new interpreter instead of a copy. So you can start off with fork
, ut then if there are any problems, switch to forkserver
or spawn
and nothing else in your code has to change. Which is pretty nice.
Whenever you fork, the entire Python process is duplicated in memory (including the Python interpreter, your code and any libraries, current stack etc.) to create a second process - one reason why forking a process is much more expensive than creating a thread.
This creates a new copy of the python interpreter.
One advantage of having two python interpreters running is that you now have two GIL's (Global Interpreter Locks), and therefore can have true multi-processing on a multi-core system.
Threads in one process share the same GIL, meaning only one runs at a given moment, giving only the illusion of parallelism.