问题
I'm trying to access cookies after I've made a request using Splash. Below is how I've build the request.
script = """
function main(splash)
splash:init_cookies(splash.args.cookies)
assert(splash:go{
splash.args.url,
headers=splash.args.headers,
http_method=splash.args.http_method,
body=splash.args.body,
})
assert(splash:wait(0.5))
local entries = splash:history()
local last_response = entries[#entries].response
return {
url = splash:url(),
headers = last_response.headers,
http_status = last_response.status,
cookies = splash:get_cookies(),
html = splash:html(),
}
end
"""
req = SplashRequest(
url,
self.parse_page,
args={
'wait': 0.5,
'lua_source': script,
'endpoint': 'execute'
}
)
The script is an exact copy from Splash documentation.
So I'm trying to access the cookies that are set on the webpage. When I'm not using Splash the code below works as I expect it to, but not when using Splash.
self.logger.debug('Cookies: %s', response.headers.get('Set-Cookie'))
This returns while using Splash:
2017-01-03 12:12:37 [spider] DEBUG: Cookies: None
When I'm not using Splash this code works and returns the cookies provided by the webpage.
The documentation of Splash shows this code as example:
def parse_result(self, response):
# here response.body contains result HTML;
# response.headers are filled with headers from last
# web page loaded to Splash;
# cookies from all responses and from JavaScript are collected
# and put into Set-Cookie response header, so that Scrapy
# can remember them.
I'm not sure whether I'm understanding this correctly, but I'd say I should be able to access the cookies in the same way as when I'm not using Splash.
Middleware settings:
# Download middlewares
DOWNLOADER_MIDDLEWARES = {
# Use a random user agent on each request
'crawling.middlewares.RandomUserAgentDownloaderMiddleware': 400,
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
# Enable crawlera proxy
'scrapy_crawlera.CrawleraMiddleware': 600,
# Enable Splash to render javascript
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
So my question is: how do I access cookies while using a Splash request?
Settings.py
spider.py
回答1:
You can set SPLASH_COOKIES_DEBUG=True
option to see all cookies which are being set. Current cookiejar, with all cookies merged, is available as response.cookiejar
when scrapy-splash
is configured correctly.
Using response.headers.get('Set-Header')
is not robust because in case of redirects (e.g. JS redirects) there could be several responses, and a cookie could be set in the first, while script returns headers only for the last response.
I'm not sure if this is a problem you're having though; the code is not an exact copy from Splash docs. Here:
req = SplashRequest(
url,
self.parse_page,
args={
'wait': 0.5,
'lua_source': script
}
)
you're sending request to the /render.json
endpoint; it doesn't execute Lua scripts; use endpoint='execute'
to fix that.
来源:https://stackoverflow.com/questions/41442465/read-cookies-from-splash-request