Why a script that uses threads prints extra lines occasionally?

ⅰ亾dé卋堺 提交于 2019-12-28 16:03:30

问题


If print s is replaced by print >>sys.stderr, s then the effect vanishes.

import random, sys, time
import threading

lock = threading.Lock()

def echo(s):
    time.sleep(1e-3*random.random()) # instead of threading.Timer()
    with lock:
        print s

for c in 'abc':
    threading.Thread(target=echo, args=(c,)).start()

Example

# Run until empty line is found:
$ while ! python example.py 2>&1|tee out|grep '^$';do echo -n .;done;cat out

Output

....................
b

c
a

The output should not contain empty lines, but it does. I understand that print is not thread-safe, but I would've thought the lock should help.

The question is why this happens?

My machine:

$ python -mplatform
Linux-2.6.38-11-generic-x86_64-with-Ubuntu-11.04-natty

Extra lines are printed on py26, py27, pypy.

py24, py25, py31, py32 behave as expected (no empty lines).

Variations

  • sys.stdout.flush() after the print doesn't solve the problem:

    with lock:
        print(s)
        sys.stdout.flush()
    
  • even more strange that ordinary sys.stdout.write() doesn't produce empty lines with lock:

    with lock:
        sys.stdout.write(s)
        sys.stdout.write('\n') #NOTE: no .flush()
    
  • print function works as expected (no empty lines).

To reproduce download files and run:

$ tox

回答1:


Take a look at this stackoverflow thread: How do I get a thread safe print in Python 2.6?. Apparently, printing to sout is not thread-safe.

If you turn on verbose threading, you can see this better:

threading.Thread(target=echo, args=(c,), verbose=True).start()

I get output like this:

MainThread: <Thread(Thread-1, initial)>.start(): starting thread
Thread-1: <Thread(Thread-1, started 6204)>.__bootstrap(): thread started
MainThread: <Thread(Thread-2, initial)>.start(): starting thread
Thread-2: <Thread(Thread-2, started 3752)>.__bootstrap(): thread started
MainThread: <Thread(Thread-3, initial)>.start(): starting thread
Thread-3: <Thread(Thread-3, started 4412)>.__bootstrap(): thread started
MainThread: <Thread(Thread-2, started 3752)>.join(): waiting until thread stops
a
b
Thread-1: <Thread(Thread-1, started 6204)>.__bootstrap(): normal return
Thread-2: <Thread(Thread-2, started 3752)>.__bootstrap(): normal return
MainThread: <Thread(Thread-2, stopped 3752)>.join(): thread stopped
MainThread: <Thread(Thread-3, started 4412)>.join(): waiting until thread stops
Thread-3: <Thread(Thread-3, started 4412)>.__bootstrap(): normal return
MainThread: <Thread(Thread-3, stopped 4412)>.join(): thread stopped
c

You can see that thread 3 is shown as finishing before printing the 'c' character. This clearly cannot be the case, so this leads me to assume that printing to the console is not thread-safe.

This, however, does not explain why printing to sys.stderr appears to work correctly.




回答2:


Because print write to stdout text first, then end string. Pseudo-code to explain:

def print(*args, **kwargs):
    write_to_stdout(to_single_string(args))
    write_to_stdout(end)  # usually a newline "\n"

so, in multithreading first string of both threads executes before second, so two newlines printed at the same time. But why lines don't on same line? I don't know. Needs to check realization of python print more deeply.



来源:https://stackoverflow.com/questions/7687862/why-a-script-that-uses-threads-prints-extra-lines-occasionally

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!