“Can't pickle ” error when using multiprocessing on Windows

后端 未结 2 1913
悲哀的现实
悲哀的现实 2021-01-12 11:55

I\'m writing a multiprocessing program to handle a large .CSV file in parallel, using Windows.

I found this excellent example for a similar problem. When running it

相关标签:
2条回答
  • 2021-01-12 12:25

    Since multiprocessing depends on serializing and de-serializing objects when passing then as parameters between process, and your code relies on passing an instance of CSVWorker around the process (the instance denoted as 'self') you got this error - as both csv readers and open files can be pickled.

    You mentioned your CSV are large, I don't think reading all data into a list would be a solution for you - so you have to think of a way of passing one line from your input CSV to each worker at once, and retrieving a processed line from each worker , and perform all the I/O on the main process.

    It looks like multiprocessing.Pool will be a better way of writing your aplication - Check multiprocessing documentation at http://docs.python.org/library/multiprocessing.html - and try using a process pool, and pool.map to process your CSV's. It also takes care of preserving the order - which will elimnate a lot of the complicated logic on your code.

    0 讨论(0)
  • 2021-01-12 12:32

    The problem you're running into is caused by using methods of the CSVWorker class as the process targets; and that class has members that cannot be pickled; those open files are just never going to work;

    What you want to do is break that class into two classes; one which coordinates all of the worker subprocesses, and another which actually does the computational work. the worker processes take filenames as arguments and open the individual files as needed, or at least wait until they have their worker methods invoked and open files only then. they can also take multiprocessing.Queues as arguments or as instance members; that's safe to pass around.

    To a certain extent, you already kinda do this; your write_output_csv method is opening the file its file in the subprocess, but your parse_input_csv method is expecting to find an already open and prepared file as a attribute of self. Do it the other way consistently and you should be in good shape.

    0 讨论(0)
提交回复
热议问题