Python multiprocessing apply_async “assert left > 0” AssertionError

前端 未结 2 852
轻奢々
轻奢々 2021-01-06 23:42

I am trying to load numpy files asynchronously in a Pool:

self.pool = Pool(2, maxtasksperchild = 1)
...
nextPackage = self.pool.apply_async(loadPackages, (..         


        
相关标签:
2条回答
  • 2021-01-07 00:21

    It think I've found a workaround by retrieving data in small chunks. In my case it was a list of lists.

    I had:

    for i in range(0, NUMBER_OF_THREADS):
        print('MAIN: Getting data from process ' + str(i) + ' proxy...')
        X_train.extend(ListasX[i]._getvalue())
        Y_train.extend(ListasY[i]._getvalue())
        ListasX[i] = None
        ListasY[i] = None
        gc.collect()
    

    Changed to:

    CHUNK_SIZE = 1024
    for i in range(0, NUMBER_OF_THREADS):
        print('MAIN: Getting data from process ' + str(i) + ' proxy...')
        for k in range(0, len(ListasX[i]), CHUNK_SIZE):
            X_train.extend(ListasX[i][k:k+CHUNK_SIZE])
            Y_train.extend(ListasY[i][k:k+CHUNK_SIZE])
        ListasX[i] = None
        ListasY[i] = None
        gc.collect()
    

    And now it seems to work, possibly by serializing less data at a time. So maybe if you can segment your data into smaller portions you can overcome the issue. Good luck!

    0 讨论(0)
  • 2021-01-07 00:40

    There is a bug in Python C core code that prevents data responses bigger than 2GB return correctly to the main thread. you need to either split the data into smaller chunks as suggested in the previous answer or not use multiprocessing for this function

    I reported this bug to python bugs list (https://bugs.python.org/issue34563) and created a PR (https://github.com/python/cpython/pull/9027) to fix it, but it probably will take a while to get it released (UPDATE: the fix is present in python 3.8.0+)

    if you are interested you can find more details on what causes the bug in the bug description in the link I posted

    0 讨论(0)
提交回复
热议问题