Empty python process hangs on join [sys.stderr.flush()]

问题

Python guru I need your help. I faced quite strange behavior: empty python Process hangs on joining. Looks like it forks some locked resource.

Env:

Python version: 3.5.3
OS: Ubuntu 16.04.2 LTS
Kernel: 4.4.0-75-generic

Problem description:

1) I have a logger with thread to handle messages in background and queue for this thread. Logger source code (a little bit simplified).

2) And I have a simple script which uses my logger (just code to display my problem):

import os
from multiprocessing import Process
from my_logging import get_logger


def func():
    pass


if __name__ == '__main__':

    logger = get_logger(__name__)
    logger.start()
    for _ in range(2):
        logger.info('message')

    proc = Process(target=func)
    proc.start()
    proc.join(timeout=3)
    print('TEST PROCESS JOINED: is_alive={0}'.format(proc.is_alive()))

    logger.stop()
    print('EXIT')

Sometimes this test script hangs. Script hangs on joining process "proc" (when script completes execution). Test process "proc" stay alive.

To reproduce this problem you can run the script in loop:

$ for i in {1..100} ; do /opt/python3.5.3/bin/python3.5 test.py ; done

Investigation:

Strace shows following:

strace: Process 25273 attached
futex(0x2275550, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff

And I figured out the place where process hangs. It hangs in multiprocessing module, file process.py, line 269 (python3.5.3), on flushing STDERR:

...
267    util.info('process exiting with exitcode %d' % exitcode)
268    sys.stdout.flush()
269    sys.stderr.flush()
...

If line 269 commented the script completes successfully always.

My thoughts:

By default logging.StreamHandler uses sys.stderr as stream.

If process has been forked when logger flushing data to STDERR, process context gets some locked resource and further hangs on flushing STDERR.

Some workarounds which solves problem:

Use python2.7. I can't reproduce it with python2.7. Maybe timings prevent me to reproduce the problem.
Use process to handle messages in logger instead of thread.

Do you have any ideas on this behavior? Where is the problem? Am I doing something wrong?

回答1:

It looks like this behaviour is related to this issue: http://bugs.python.org/issue6721

回答2:

Question: Sometimes ... Test process "proc" stay alive.

I could only reproduce your
TEST PROCESS:0 JOINED: is_alive=True
by adding a time.sleep(5) to def func():.
You use proc.join(timeout=3), that's the expected behavior.
Conclusion:
Overloading your System, starts in my Environment with 30 Processes running, triggers your proc.join(timeout=3). You may rethink your Testcase to reproduce your problem.

One Approach I think, is fine-tuning your Process/Thread with some time.sleep(0.05) to give off a timeslice.

Your are using from multiprocessing import Queue use from queue import Queue instead.

From the Documentation
Class multiprocessing.Queue
A queue class for use in a multi-processing (rather than multi-threading) context.

In class QueueHandler(logging.Handler):, prevent to do

self.queue.put_nowait(record)

after

class QueueListener(object):
...
def stop(self):
    ...

implement, for instance

class QueueHandler(logging.Handler):
  def __init__(self):
      self.stop = Event()
      ...

In def _monitor(self): use only ONE while ... loop.
Wait until the self._thread stoped

class QueueListener(object):
...
def stop(self):
     self.handler.stop.set()
     while not self.queue.empty():
         time.sleep(0.5)
     # Don't use double flags
     #self._stop.set()
     self.queue.put_nowait(self._sentinel)
     self._thread.join()

来源：https://stackoverflow.com/questions/44069717/empty-python-process-hangs-on-join-sys-stderr-flush

标签

python

multithreading

logging

multiprocessing

fork