How to use boto3 client with Python multiprocessing?

前端 未结 2 1821
滥情空心
滥情空心 2021-01-12 05:11

Code looks something like this:

import multiprocessing as mp
from functools import partial

import boto3
import numpy as np


s3 = boto3.client(\'s3\')

def          


        
2条回答
  •  说谎
    说谎 (楼主)
    2021-01-12 05:59

    Objects passed to mp.starmap() must be pickle-able, and S3 clients are not pickle-able. Bringing the actions of the S3 client outside of the function that calls mp.starmap() can solve the issue:

    import multiprocessing as mp
    from functools import partial
    
    import boto3
    import numpy as np
    
    
    s3 = boto3.client('s3')
    archive = np.load(s3.get_object('some_key')) # Simplified -- details not relevant # Move the s3 call here, outside of the do() function
    
    def _something(**kwargs):
        # Some mixed integer programming stuff related to the variable archive
        return np.array(some_variable_related_to_archive)
    
    
    def do(archive): # pass the previously loaded archive, and not the s3 object into the function
        pool = mp.pool()
        sub_process = partial(_something, slack=0.1)
        parts = np.array_split(archive, some_int)
        target_parts = np.array(things)
    
        out = pool.starmap(sub_process, [x for x in zip(parts, target_parts)] # Error occurs at this line
    
        pool.close()
        pool.join()
    
    do(archive) # pass the previously loaded archive, and not the s3 object into the function
    

提交回复
热议问题