I am attempting to download a whole ftp directory in parallel.
#!/usr/bin/python
import sys
import datetime
import os
from multiprocessing import Process, Po
Update, May 9, 2014:
I have determined the precise limitation. It is possible to send objects across process boundaries to worker processes as long as the objects can be pickled by Python's pickle facility. The problem which I described in my original answer occurred because I was trying to send a file handle to the workers. A quick experiment demonstrates why this doesn't work:
>>> f = open("/dev/null")
>>> import pickle
>>> pickle.dumps(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
Thus, if you're encountering the Python error which led you to find this Stack Overflow question, make sure all the things you're sending across process boundaries can be pickled.
Original answer:
I'm a bit late to answering. However, I ran into the same error message as the original poster while trying to use Python's multiprocessing module. I'll record my findings so that anyone else who stumbles upon this thread has something to try.
In my case, the error occurred because of what I was trying to send to the pool of workers: I was trying to pass an array of file objects for the pool workers to chew on. That's apparently too much to send across process boundaries in Python. I solved the problem by sending the pool workers dictionaries which specified input and output filename strings.
So it seems that the iterable that you supply to the function such as apply_async
(I used map()
and imap_unordered()
) can contain a list of numbers or strings, or even a detailed dictionary data structure (as long as the values aren't objects).
In your case:
pool.apply_async(downloadFile, (filename,local_filename,ftp))
ftp
is an object, which might be causing the problem. As a workaround, I would recommend sending the parameters to the worker (looks like host
and path
in this case) and let the worker instantiate the object and deal with the cleanup.
Have you tried:
pool.apply_async(downloadFile, args=(filename,local_filename,ftp))
The prototype is :
apply_async(func, args=(), kwds={}, callback=None)