Why python print is delayed?

问题

I am trying to download file using requests, and print a dot every time retrieve 100k size of file, but all the dots is printed out at the end. See code.

with open(file_name,'wb') as file:
    print("begin downloading, please wait...")
    respond_file = requests.get(file_url,stream=True)
    size = len(respond_file.content)//1000000

    #the next line will not be printed until file is downloaded
    print("the file size is "+ str(size) +"MB")
    for chunk in respond_file.iter_content(102400):
        file.write(chunk)
        #print('',end='.')
        sys.stdout.write('.')
        sys.stdout.flush()
    print("")

回答1:

You are accessing request.content here:

size = len(respond_file.content)//1000000

Accessing that property forces the whole response to be downloaded, and for large responses this takes some time. Use int(respond_file.headers['content-length']) instead:

size = int(respond_file.headers['content-length']) // 1000000

The Content-Length header is provided by the server and since it is part of the headers you have access to that information without downloading all of the content first.

If the server chooses to use Transfer-Encoding: chunked to stream the response, no Content-Length header has to be set; you may need to take that into account:

content_length = respond_file.headers.get('content-length', None)
size_in_kb = '{}KB'.format(int(content_length) // 1024) if content_length else 'Unknown'
print("the file size is", size_in_kb)

where the size in kilobytes is calculated by dividing the length by 1024, not 1 million.

Alternatively, ask for the size in a separate HEAD request (only fetching the headers):

head_response = requests.get(file_url)
size = int(head_response.headers.get('content-length', 0))

回答2:

This should work how you expect. Getting the length of respond_file is not what you wanted. Instead check the content-length header.

Note: I changed the code to display KB instead (for the purposes of testing).

import requests
import sys

file_url = "https://github.com/kennethreitz/requests/archive/master.zip"
file_name = "out.zip"

with open(file_name,'wb') as file:
    print("begin downloading, please wait...")
    respond_file = requests.get(file_url,stream=True)
    size = int(respond_file.headers['content-length'])//1024

    #the next line will not be printed until file is downloaded
    print("the file size is "+ str(size) +"KB")
    for chunk in respond_file.iter_content(1024):
        file.write(chunk)
        #print('',end='.')
        sys.stdout.write('.')
        sys.stdout.flush()
    print("")

回答3:

As @kevin writes in a comment, respond.file.content blocks the execution until the whole content is downloaded. The only difference between my answer and his comment is that I'm not guessing ;)

来源：https://stackoverflow.com/questions/30056960/why-python-print-is-delayed

标签

python

python-3.x

web-crawler

python-requests