问题

I'm trying to develop a very simple proof-of-concept to retrieve and process data in a streaming manner. The server I'm requesting from will send data in chunks, which is good, but I'm having issues using httplib to iterate through the chunks.

Here's what I'm trying: import httplib

def getData(src):
    d = src.read(1024)
    while d and len(d) > 0:
        yield d
        d = src.read(1024)

if __name__ == "__main__":
    con = httplib.HTTPSConnection('example.com', port='8443', cert_file='...', key_file='...')
    con.putrequest('GET', '/path/to/resource')
    response = con.getresponse()

    for s in getData(response):
        print s
        raw_input() # Just to give me a moment to examine each packet

Pretty simple. Just open an HTTPS connection to server, request a resource, and grab the result, 1024 bytes at a time. I'm definitely making the HTTPS connection successfully, so that's not a problem at all.

However, what I'm finding is that the call to src.read(1024) returns the same thing every time. It only ever returns the first 1024 bytes of the response, apparently never keeping track of a cursor within the file.

So how am I supposed to receive 1024 bytes at a time? The documentation on read() is pretty sparse. I've thought about using urllib or urllib2, but neither seems to be able to make an HTTPS connection.

HTTPS is required, and I am working in a rather restricted corporate environment where packages like Requests are a bit tough to get my hands on. If possible, I'd like to find a solution within Python's standard lib.

// Big Old Fat Edit

Turns out in my original code I had simply forgot to update the d variable. I initialized it with a read outside the yield loop and never changed it in the loop. Once I added it back in there it worked perfectly.

So, in short, I'm just a big idiot.

回答1:

Is your con.putrequest() actually working? Doing a request with that method requires you to also call a bunch of other methods as you can see in the official httplib documentation:

http://docs.python.org/2/library/httplib.html

As an alternative to using the request() method described above, you can also send your request step by step, by using the four functions below.

putrequest()
putheader()
endheaders()
send()

Is there any reason why you're not using the default HTTPConnection.request() function?

Here's a working version for me, using request() instead:

import httlplib

def getData(src, chunk_size=1024):
    d = src.read(chunk_size)
    while d:
        yield d
        d = src.read(chunk_size)

if __name__ == "__main__":
    con = httplib.HTTPSConnection('google.com')
    con.request('GET', '/')
    response = con.getresponse()

    for s in getData(response, 8):
        print s
        raw_input() # Just to give me a moment to examine each packet

回答2:

You can use the seek command to move the cursor along with your read.

This is my attempt at the problem. I apologize if I made it less pythonic in process.

if __name__ == "__main__":
     con = httplib.HTTPSConnection('example.com', port='8443', cert_file='...', key_file='...')
    con.putrequest('GET', '/path/to/resource')
    response = con.getresponse()
    c=0
    while True:
        response.seek(c*1024,0)
        data =d.read(1024)
        c+=1
        if len(data)==0:
            break
        print data
        raw_input()

I hope it is at least helpful.

来源：https://stackoverflow.com/questions/17452485/is-it-possible-to-loop-over-an-httplib-httpresponses-data

标签

python

streaming

httplib

Is it possible to loop over an httplib.HTTPResponse's data?

问题

// Big Old Fat Edit

回答1:

回答2: