Are Generators Threadsafe?

后端 未结 6 845
闹比i
闹比i 2020-12-02 09:29

I have a multithreaded program where I create a generator function and then pass it to new threads. I want it to be shared/global in nature so each thread can get the next

相关标签:
6条回答
  • 2020-12-02 09:39

    It depends on which python implementation you're using. In CPython, the GIL makes all operations on python objects threadsafe, as only one thread can be executing code at any given time.

    http://en.wikipedia.org/wiki/Global_Interpreter_Lock

    0 讨论(0)
  • 2020-12-02 09:46

    Courtesy of IIRC python freenode, here is a working solutions for python 3.x

    Generators are not thread safe by default, but heres how to make them to be thread safe

    def my_generator():
        while True:
            for x in range(10):
                yield x
    

    class LockedIterator(object):
        def __init__(self, it):
            self._lock = threading.Lock()
            self._it = iter(it)
    
        def __iter__(self):
            return self
    
        def __next__(self):
            with self._lock:
                return next(self._it)
    
    n = LockedIterator(my_generator)
    
    next(n)
    next(n)
    next(n)
    

    OR use a function

    def threadsafe_iter(iterable):
        lock = threading.Lock()
        iterator = iter(iterable)
        while True:
            with lock:
                for value in iterator:
                    break
                else:
                    return
            yield value
    
    n = threadsafe_iter(my_generator)
    
    next(n)
    next(n)
    next(n)
    
    0 讨论(0)
  • 2020-12-02 09:50

    Edited to add benchmark below.

    You can wrap a generator with a lock. For example,

    import threading
    class LockedIterator(object):
        def __init__(self, it):
            self.lock = threading.Lock()
            self.it = it.__iter__()
    
        def __iter__(self): return self
    
        def next(self):
            self.lock.acquire()
            try:
                return self.it.next()
            finally:
                self.lock.release()
    
    gen = [x*2 for x in [1,2,3,4]]
    g2 = LockedIterator(gen)
    print list(g2)
    

    Locking takes 50ms on my system, Queue takes 350ms. Queue is useful when you really do have a queue; for example, if you have incoming HTTP requests and you want to queue them for processing by worker threads. (That doesn't fit in the Python iterator model--once an iterator runs out of items, it's done.) If you really do have an iterator, then LockedIterator is a faster and simpler way to make it thread safe.

    from datetime import datetime
    import threading
    num_worker_threads = 4
    
    class LockedIterator(object):
        def __init__(self, it):
            self.lock = threading.Lock()
            self.it = it.__iter__()
    
        def __iter__(self): return self
    
        def next(self):
            self.lock.acquire()
            try:
                return self.it.next()
            finally:
                self.lock.release()
    
    def test_locked(it):
        it = LockedIterator(it)
        def worker():
            try:
                for i in it:
                    pass
            except Exception, e:
                print e
                raise
    
        threads = []
        for i in range(num_worker_threads):
            t = threading.Thread(target=worker)
            threads.append(t)
            t.start()
    
        for t in threads:
            t.join()
    
    def test_queue(it):
        from Queue import Queue
        def worker():
            try:
                while True:
                    item = q.get()
                    q.task_done()
            except Exception, e:
                print e
                raise
    
        q = Queue()
        for i in range(num_worker_threads):
             t = threading.Thread(target=worker)
             t.setDaemon(True)
             t.start()
    
        t1 = datetime.now()
    
        for item in it:
            q.put(item)
    
        q.join()
    
    start_time = datetime.now()
    it = [x*2 for x in range(1,10000)]
    
    test_locked(it)
    #test_queue(it)
    end_time = datetime.now()
    took = end_time-start_time
    print "took %.01f" % ((took.seconds + took.microseconds/1000000.0)*1000)
    
    0 讨论(0)
  • 2020-12-02 09:56

    It's not thread-safe; simultaneous calls may interleave, and mess with the local variables.

    The common approach is to use the master-slave pattern (now called farmer-worker pattern in PC). Make a third thread which generates data, and add a Queue between the master and the slaves, where slaves will read from the queue, and the master will write to it. The standard queue module provides the necessary thread safety and arranges to block the master until the slaves are ready to read more data.

    0 讨论(0)
  • 2020-12-02 09:58

    No, they are not thread-safe. You can find interesting info about generators and multi-threading in:

    http://www.dabeaz.com/generators/Generators.pdf

    0 讨论(0)
  • The generator object itself is thread-safe as any PyObject protected by the GIL. But the thread trying to get the next element from the generator which is already in execution state in other thread (executing the generator code between the yield's) would get ValueError:

    ValueError: generator already executing
    

    Sample code:

    from threading import Thread
    from time import sleep
    
    def gen():
        sleep(1)
        yield
    
    g = gen()
    
    Thread(target=g.__next__).start()
    Thread(target=g.__next__).start()
    

    Results in:

    Exception in thread Thread-2:
    Traceback (most recent call last):
      File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
      File "/usr/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    ValueError: generator already executing
    

    But, actually this is not related to threading at all. And could be reproduced inside a single thread:

    def gen():
        yield next(g)
    
    g = gen()
    
    next(g)
    
    0 讨论(0)
提交回复
热议问题