how to upload chunks of a string longer than 2147483647 bytes?

后端未结

关注

 1  1568

终归单人心 2021-01-11 16:17

I am trying to upload a file around ~5GB size as below but, it throws the error string longer than 2147483647 bytes. It sounds like there is a limit of 2 GB to

1条回答

隐瞒了意图╮ (楼主)

2021-01-11 16:42

Your question has been asked on the requests bug tracker; their suggestion is to use streaming upload. If that doesn't work, you might see if a chunk-encoded request works.

[edit]

Example based on the original code:

# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
    r = requests.put(
        "{base}problems/{pid}/{atype}/{path}".format(
            base=self._baseurl,
            # It's better to use consistent naming; search PEP-8 for standard Python conventions.
            pid=problem_id,
            atype=attachment_type,
            path=urllib.quote(os.path.basename(attachment_path)),
        ),
        headers=headers,
        # Note that you're passing the file object, NOT the contents of the file:
        data=file_to_upload,
        # Hard to say whether this is a good idea with a large file upload
        timeout=300,
    )

I can't guarantee this would run as-is, since I can't realistically test it, but it should be close. The bug tracker comments I linked to also mention that sending multiple headers may cause issues, so if the headers you're specifying are actually necessary, this may not work.

Regarding chunk encoding: This should be your second choice. Your code was not specifying 'rb' as the mode for open(...), so changing that should probably make the code above work. If not, you could try this.

def read_in_chunks():
    # If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
    chunk_size = 30720 * 30720

    # I don't know how correct this is; if it doesn't work as expected, you'll need to debug
    with open(attachment_path, 'rb') as file_object:
        while True:
            data = file_object.read(chunk_size)
            if not data:
                break
            yield data


# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
    "{base}problems/{pid}/{atype}/{path}".format(
        base=self._baseurl,
        pid=problem_id,
        atype=attachment_type,
        path=urllib.quote(os.path.basename(attachment_path)),
    ),
    headers=headers,
    # Call the chunk function here and the request will be chunked as you specify
    data=read_in_chunks(),
    timeout=300,
)

0 讨论(0)