Python multiprocessing: TypeError: expected string or Unicode object, NoneType found

后端 未结 2 543
眼角桃花
眼角桃花 2021-01-01 17:08

I am attempting to download a whole ftp directory in parallel.

#!/usr/bin/python
import sys
import datetime
import os
from multiprocessing import Process, Po         


        
相关标签:
2条回答
  • 2021-01-01 17:44

    Update, May 9, 2014:

    I have determined the precise limitation. It is possible to send objects across process boundaries to worker processes as long as the objects can be pickled by Python's pickle facility. The problem which I described in my original answer occurred because I was trying to send a file handle to the workers. A quick experiment demonstrates why this doesn't work:

    >>> f = open("/dev/null")
    >>> import pickle
    >>> pickle.dumps(f)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
        Pickler(file, protocol).dump(obj)
      File "/usr/lib/python2.7/pickle.py", line 224, in dump
        self.save(obj)
      File "/usr/lib/python2.7/pickle.py", line 306, in save
        rv = reduce(self.proto)
      File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
        raise TypeError, "can't pickle %s objects" % base.__name__
    TypeError: can't pickle file objects
    

    Thus, if you're encountering the Python error which led you to find this Stack Overflow question, make sure all the things you're sending across process boundaries can be pickled.

    Original answer:

    I'm a bit late to answering. However, I ran into the same error message as the original poster while trying to use Python's multiprocessing module. I'll record my findings so that anyone else who stumbles upon this thread has something to try.

    In my case, the error occurred because of what I was trying to send to the pool of workers: I was trying to pass an array of file objects for the pool workers to chew on. That's apparently too much to send across process boundaries in Python. I solved the problem by sending the pool workers dictionaries which specified input and output filename strings.

    So it seems that the iterable that you supply to the function such as apply_async (I used map() and imap_unordered()) can contain a list of numbers or strings, or even a detailed dictionary data structure (as long as the values aren't objects).

    In your case:

    pool.apply_async(downloadFile, (filename,local_filename,ftp))
    

    ftp is an object, which might be causing the problem. As a workaround, I would recommend sending the parameters to the worker (looks like host and path in this case) and let the worker instantiate the object and deal with the cleanup.

    0 讨论(0)
  • 2021-01-01 17:48

    Have you tried:

    pool.apply_async(downloadFile, args=(filename,local_filename,ftp))
    

    The prototype is :

    apply_async(func, args=(), kwds={}, callback=None)
    
    0 讨论(0)
提交回复
热议问题