multiprocessing queue full

前端未结

关注

 2  1983

悲哀的现实 2021-02-13 15:51

I\'m using concurrent.futures to implement multiprocessing. I am getting a queue.Full error, which is odd because I am only assigning 10 jobs.

A_list = [np.rando


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   孤独总比滥情好
                                             
                
                
                (楼主)
            
              
              
                2021-02-13 16:20
              

            
            
                        
I've recently stumbled upon this, while debugging a python3.6 program which sends  various GBs of data over the pipes. This is what I found (hoping it could save someone else's time!).

Like skrrgwasme said, if the queue manager is unable to acquire a semaphore while sending a poison pill, it raises a queue Full error. 
The acquire call to the semaphore is non-blocking and it causes the manager to fail (it's unable to send a 'control' command due to data and control flow sharing the same Queue). Note that the links above refer to python 3.6.0

Now I was wondering why my queue manager would send the poison pill. There must have been some other failure!
Apparently some exception had happened (in some other subprocess? in the parent?), and the queue manager was trying to clean up and shut down all the subprocesses. At this point I was interested in finding this root cause.

Debugging the root cause

I initially tried logging all exceptions in the subprocesses but apparently no explicit error happened there.
From issue 3895:


  Note that multiprocessing.Pool is also broken when a result fails at unpickle.


it seems that the multiprocessing module is broken in py36, in that it won't catch and treat a serialization error correctly. 

Unfortunately, due to time constraints I didn't manage to replicate and verify the problem myself, preferring to jump to the action points and better programming practices (don't send all that data through pipes :). Here's a couple of ideas:


Try to pickle the data supposed to run through the pipes. Due to the huge nature of my data (hundreds of GBs) and time constraints, I didn't manage to find which records were unserializable. 
Put a debugger into python3.6 and print the original exception.


Action points


Remodel your program to send less data through the pipes if possible.
After reading issue 3895 it appears the problem arises with pickling errors. An alternative (and good programming practice) could be to transfer the data using different means. For example one could have the subprocesses write to files and return the paths to the parent process (this would be just a small string, probably a few bytes).
Wait for future python versions. Apparently this was fixed on python version tag v3.7.0b3 in the context of issue 3895. The Full exception will be handled inside shutdown_worker. The current maintenance version of Python at the time of writing is 3.6.5 

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复