Python rpy2 and matplotlib conflict when using multiprocessing

前端 未结 5 1825
误落风尘
误落风尘 2021-02-07 14:14

I am trying to calculate and generate plots using multiprocessing. On Linux the code below runs correctly, however on the Mac (ML) it doesn\'t, giving the error below:



        
相关标签:
5条回答
  • 2021-02-07 14:57

    This error occurs on Mac OS X when you perform a GUI operation outside the main thread, which is exactly what you are doing by shifting your plot function to the multiprocessing.Pool (I imagine that it will not work on Windows either for the same reason - since Windows has the same requirement). The only way that I can imagine it working is using the pool to generate the data, then have your main thread wait in a loop for the data that's returned (a queue is the way I usually handle it...).

    Here is an example (recognizing that this may not do what you want - plot all the figures "simultaneously"? - plt.show() blocks so only one is drawn at a time and I note that you do not have it in your sample code - but without I don't see anything on my screen - however, if I take it out - there is no blocking and no error because all GUI functions are happening in the main thread):

    import multiprocessing
    import matplotlib.pyplot as plt
    import numpy as np
    import rpy2.robjects as robjects
    
    data_queue = multiprocessing.Queue()
    
    
    def main():
        pool = multiprocessing.Pool()
        num_figs = 10
    
        # generate some random numbers
        input = zip(np.random.randint(10,10000,num_figs), range(num_figs))  
        pool.map(worker, input)
    
        figs_complete = 0
        while figs_complete < num_figs:
            data = data_queue.get()
            plt.figure()
            plt.plot(data)
            plt.show()
            figs_complete += 1
    
    def worker(args):
        num, i = args
        data = np.random.randn(num).cumsum()
        data_queue.put(data)
        print('done ',i)
    
    main()
    

    Hope this helps.

    0 讨论(0)
  • 2021-02-07 15:00

    This might be rpy2-specific. There are reports of a similar problem with OS X and multiprocessing here and there.

    I think that using an initializer that imports the packages needed to run the code in plot could solve the problem (multiprocessing-doc).

    0 讨论(0)
  • 2021-02-07 15:13

    Try to upgrade matplotlib to 3.0.3:

    pip3 install matplotlib --upgrade
    

    Then everything goes fine.

    =======================================================================

    No need to read below anymore.

    Yesterday, my multiprocess plot works on my MacBook Air. But not working on my MacBook Pro tomorrow morning with the same code, displaying many:

    The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
    Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
    The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
    

    They are all using 4th gen i intel CPU (i5-4xxx with air and i7-4xxx with pro). So if there are no difference on hardware, it must be on software.

    So I just tried update matplot to 3.0.3 on MacBook Pro( was 3.0.1), every thing goes fine.

    Also, no need to do pool.apply_async anymore.

    0 讨论(0)
  • 2021-02-07 15:14

    I had a similar issue and found that setting the start method in multiprocessing to use forkserver works as long as it comes after your if name == main: statement.

    if __name__ == '__main__':
        multiprocessing.set_start_method('forkserver')
        first_process = multiprocessing.Process(target = targetOne)
        second_process = multiprocessing.Process(target = targetTwo)
        first_process.start()
        second_process.start()
    
    0 讨论(0)
  • 2021-02-07 15:17

    I had a similar issue with my worker, which was loading some data, generating a plot, and saving it to a file. Note that this is slightly different than what the OP's case, which seems to be oriented around interactive plotting. Still, I think it's relevant.

    A simplified version of my code:

    def worker(id):
        data = load_data(id)
        plot_data_to_file(data) # Generates a plot and saves it to a file.
    
    def plot_something_parallel(ids):
        pool = multiprocessing.Pool()
        pool.map(worker, ids)
    
    plot_something_parallel(ids=[1,2,3])
    

    This caused the same error others mention:

    The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
    Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
    

    Following @bbbruce's train of thought, I solved my problem by switching the matplotlib backend from TKAgg to the default. Specifically, I commented out the following line in my matplotlibrc file:

    #backend : TkAgg
    
    0 讨论(0)
提交回复
热议问题