I am trying to upload a file around ~5GB size as below but, it throws the error string longer than 2147483647 bytes
. It sounds like there is a limit of 2 GB to
Your question has been asked on the requests bug tracker; their suggestion is to use streaming upload. If that doesn't work, you might see if a chunk-encoded request works.
[edit]
Example based on the original code:
# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
r = requests.put(
"{base}problems/{pid}/{atype}/{path}".format(
base=self._baseurl,
# It's better to use consistent naming; search PEP-8 for standard Python conventions.
pid=problem_id,
atype=attachment_type,
path=urllib.quote(os.path.basename(attachment_path)),
),
headers=headers,
# Note that you're passing the file object, NOT the contents of the file:
data=file_to_upload,
# Hard to say whether this is a good idea with a large file upload
timeout=300,
)
I can't guarantee this would run as-is, since I can't realistically test it, but it should be close. The bug tracker comments I linked to also mention that sending multiple headers may cause issues, so if the headers you're specifying are actually necessary, this may not work.
Regarding chunk encoding: This should be your second choice. Your code was not specifying 'rb'
as the mode for open(...)
, so changing that should probably make the code above work. If not, you could try this.
def read_in_chunks():
# If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
chunk_size = 30720 * 30720
# I don't know how correct this is; if it doesn't work as expected, you'll need to debug
with open(attachment_path, 'rb') as file_object:
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
"{base}problems/{pid}/{atype}/{path}".format(
base=self._baseurl,
pid=problem_id,
atype=attachment_type,
path=urllib.quote(os.path.basename(attachment_path)),
),
headers=headers,
# Call the chunk function here and the request will be chunked as you specify
data=read_in_chunks(),
timeout=300,
)