问题
I'm working with a process which is basically as follows:
- Take some list of urls.
- Get a Response object from each.
- Create a BeautifulSoup object from the
text
of each Response. - Pull the text of a certain tag from that BeautifulSoup object.
From my understanding, this seems ideal for grequests:
GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.
But yet, the two processes (one with requests, one with grequests) seem to be getting me different results, with some of the requests in grequests returning None
rather than a response.
Using requests
import requests
tickers = [
'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI',
'ADM', 'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN',
'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
]
BASE = 'https://finance.google.com/finance?q={}'
rs = (requests.get(u) for u in [BASE.format(t) for t in tickers])
rs = list(rs)
rs
# [<Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# ...
# <Response [200]>]
# All are okay (status_code == 200)
Using grequests
# Restarted my interpreter and redefined `tickers` and `BASE`
import grequests
rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rs = grequests.map(rs)
rs
# [None,
# <Response [200]>,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>]
Why the difference in results?
Update: I can print the exception type as follows. Related discussion here but I have no idea what's going on.
def exception_handler(request, exception):
print(exception)
rs = grequests.map(rs, exception_handler=exception_handler)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
System/version info
- requests: 2.18.4
- grequests: 0.3.0
- Python: 3.6.3
- urllib3: 1.22
- pyopenssl: 17.2.0
- All via Anaconda
- System: same issue on both Mac OSX HS & Windows 10, build 10.0.16299
回答1:
You are just sending requests too fast. As grequests
is an async lib, all of these requests are almost sent simultaneously. They are too many.
You just need to limit the concurrent tasks by grequests.map(rs, size=your_choice)
, I have tested grequests.map(rs, size=10)
and it works well.
回答2:
I do not know the exact reason for the observed behavior with .map()
. However, using the .imap()
function with size=1
always returned a 'Response 200' for my few minutes testing. Here is the code snipet:
rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1)
rsm_list = [r for r in rsm_iterator]
print(rsm_list)
And if you don't want to wait for all requests to finish before working on their answers, you can do this like so:
rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1)
for r in rsm_iterator:
print(r)
来源:https://stackoverflow.com/questions/46205491/understanding-requests-versus-grequests