Using Concurrent Futures without running out of RAM

后端 未结 2 1347
礼貌的吻别
礼貌的吻别 2021-02-04 13:25

I\'m doing some file parsing that is a CPU bound task. No matter how many files I throw at the process it uses no more than about 50MB of RAM. The task is parrallelisable, and I

相关标签:
2条回答
  • 2021-02-04 13:52

    you can try add del to your code like this

    for job in futures.as_completed(jobs):
        del jobs[job]
        del job #or job._result = None
    
    0 讨论(0)
  • 2021-02-04 13:59

    I'll take a shot (Might be a wrong guess...)

    You might need to submit your work bit by bit since on each submit you're making a copy of parser_variables which may end up chewing your RAM.

    Here is working code with "<----" on the interesting parts

    with futures.ProcessPoolExecutor(max_workers=6) as executor:
        # A dictionary which will contain a list the future info in the key, and the filename in the value
        jobs = {}
    
        # Loop through the files, and run the parse function for each file, sending the file-name to it.
        # The results of can come back in any order.
        files_left = len(files_list) #<----
        files_iter = iter(files_list) #<------
    
        while files_left:
            for this_file in files_iter:
                job = executor.submit(parse_function, this_file, **parser_variables)
                jobs[job] = this_file
                if len(jobs) > MAX_JOBS_IN_QUEUE:
                    break #limit the job submission for now job
    
            # Get the completed jobs whenever they are done
            for job in futures.as_completed(jobs):
    
                files_left -= 1 #one down - many to go...   <---
    
                # Send the result of the file the job is based on (jobs[job]) and the job (job.result)
                results_list = job.result()
                this_file = jobs[job]
    
                # delete the result from the dict as we don't need to store it.
                del jobs[job]
    
                # post-processing (putting the results into a database)
                post_process(this_file, results_list)
                break; #give a chance to add more jobs <-----
    
    0 讨论(0)
提交回复
热议问题