Multiple threads writing to the same CSV in Python

后端 未结 3 1808
栀梦
栀梦 2020-12-08 23:53

I\'m new to multi-threading in Python and am currently writing a script that appends to a csv file. If I was to have multiple threads submitted to an concurrent.future

相关标签:
3条回答
  • 2020-12-08 23:58

    Way-late-to-the-party note: You could handle this a different way with no locking by having a single writer consuming from a shared Queue, with rows being pushed to the Queue by the threads doing the processing.

    from threading import Thread
    from queue import Queue
    from random import randint
    from concurrent.futures import ThreadPoolExecutor
    
    
    # CSV writer setup goes here
    
    queue = Queue()
    
    
    def consume():
        while True:
            if not queue.empty():
                i = queue.get()
    
                # Row comes out of queue; CSV writing goes here
    
                print(i)
                if i == 4999:
                    return
    
    
    consumer = Thread(target=consume)
    consumer.setDaemon(True)
    consumer.start()
    
    
    def produce(i):
        # Data processing goes here; row goes into queue
        queue.put(i)
    
    
    with ThreadPoolExecutor(max_workers=10) as executor:
        for i in range(5000):
            executor.submit(produce, i)
    
    consumer.join()
    
    0 讨论(0)
  • 2020-12-09 00:05

    I am not sure if csvwriter is thread-safe. The documentation doesn't specify, so to be safe, if multiple threads use the same object, you should protect the usage with a threading.Lock:

    # create the lock
    import threading
    csv_writer_lock = threading.Lock()
    
    def downloadThread(arguments......):
        # pass csv_writer_lock somehow
        # Note: use csv_writer_lock on *any* access
        # Some code.....
        with csv_writer_lock:
            writer.writerow(re.split(',', line.decode()))
    

    That being said, it may indeed be more elegant for the downloadThread to submit write tasks to an executor, instead of explicitly using locks like this.

    0 讨论(0)
  • 2020-12-09 00:08

    here is some code, it also handles the headache-causing unicode issue:

    def ensure_bytes(s):
        return s.encode('utf-8') if isinstance(s, unicode) else s
    
    class ThreadSafeWriter(object):
    '''
    >>> from StringIO import StringIO
    >>> f = StringIO()
    >>> wtr = ThreadSafeWriter(f)
    >>> wtr.writerow(['a', 'b'])
    >>> f.getvalue() == "a,b\\r\\n"
    True
    '''
    
        def __init__(self, *args, **kwargs):
            self._writer = csv.writer(*args, **kwargs)
            self._lock = threading.Lock()
    
        def _encode(self, row):
            return [ensure_bytes(cell) for cell in row]
    
        def writerow(self, row):
            row = self._encode(row)
            with self._lock:
                return self._writer.writerow(row)
    
        def writerows(self, rows):
            rows = (self._encode(row) for row in rows)
            with self._lock:
                return self._writer.writerows(rows)
    
    # example:
    with open('some.csv', 'w') as f:
        writer = ThreadSafeWriter(f)
        writer.write([u'中文', 'bar'])
    

    a more detailed solution is here

    0 讨论(0)
提交回复
热议问题