python struct.error: 'i' format requires -2147483648 <= number <= 2147483647

后端 未结 2 1009
予麋鹿
予麋鹿 2020-12-03 06:49

Problem

I\'m willing to do a feature engineering using multiprocessing module (multiprocessing.Pool.starmap(). However, it gives an error message as f

相关标签:
2条回答
  • 2020-12-03 07:40

    The communication protocol between processes uses pickling, and the pickled data is prefixed with the size of the pickled data. For your method, all arguments together are pickled as one object.

    You produced an object that when pickled is larger than fits in a i struct formatter (a four-byte signed integer), which breaks the assumptions the code has made.

    You could delegate reading of your dataframes to the child process instead, only sending across the metadata needed to load the dataframe. Their combined size is nearing 1GB, way too much data to share over a pipe between your processes.

    Quoting from the Programming guidelines section:

    Better to inherit than pickle/unpickle

    When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

    If you are not running on Windows and use either the spawn or forkserver methods, you could load your dataframes as globals before starting your subprocesses, at which point the child processes will 'inherit' the data via the normal OS copy-on-write memory page sharing mechanisms.

    Note that this limit was raised for non-Windows systems in Python 3.8, to an unsigned long long (8 bytes), and so you can now send and receive 4 EiB of data. See this commit, and Python issues #35152 and #17560.

    If you can't upgrade and you can't make use of resource inheriting, and are not running on Windows, then use this patch:

    import functools
    import logging
    import struct
    import sys
    
    logger = logging.getLogger()
    
    
    def patch_mp_connection_bpo_17560():
        """Apply PR-10305 / bpo-17560 connection send/receive max size update
    
        See the original issue at https://bugs.python.org/issue17560 and 
        https://github.com/python/cpython/pull/10305 for the pull request.
    
        This only supports Python versions 3.3 - 3.7, this function
        does nothing for Python versions outside of that range.
    
        """
        patchname = "Multiprocessing connection patch for bpo-17560"
        if not (3, 3) < sys.version_info < (3, 8):
            logger.info(
                patchname + " not applied, not an applicable Python version: %s",
                sys.version
            )
            return
    
        from multiprocessing.connection import Connection
    
        orig_send_bytes = Connection._send_bytes
        orig_recv_bytes = Connection._recv_bytes
        if (
            orig_send_bytes.__code__.co_filename == __file__
            and orig_recv_bytes.__code__.co_filename == __file__
        ):
            logger.info(patchname + " already applied, skipping")
            return
    
        @functools.wraps(orig_send_bytes)
        def send_bytes(self, buf):
            n = len(buf)
            if n > 0x7fffffff:
                pre_header = struct.pack("!i", -1)
                header = struct.pack("!Q", n)
                self._send(pre_header)
                self._send(header)
                self._send(buf)
            else:
                orig_send_bytes(self, buf)
    
        @functools.wraps(orig_recv_bytes)
        def recv_bytes(self, maxsize=None):
            buf = self._recv(4)
            size, = struct.unpack("!i", buf.getvalue())
            if size == -1:
                buf = self._recv(8)
                size, = struct.unpack("!Q", buf.getvalue())
            if maxsize is not None and size > maxsize:
                return None
            return self._recv(size)
    
        Connection._send_bytes = send_bytes
        Connection._recv_bytes = recv_bytes
    
        logger.info(patchname + " applied")
    
    0 讨论(0)
  • 2020-12-03 07:41

    this problem was fixed in a recent PR to python https://github.com/python/cpython/pull/10305

    if you want, you can make this change locally to make it work for you right away, without waiting for a python and anaconda release.

    0 讨论(0)
提交回复
热议问题