问题
I have written scrapy code for log in to a site. first i tried for one site. It worked well. But then i changed the url and tried for other site. It is not working for that site. I used the same code without any change. What would be the problem?
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class QuoteSpider(scrapy.Spider):
name = 'Quote'
allowed_domains = ["quotes.toscrape.com"]
start_urls = (
'http://quotes.toscrape.com/login',
)
def parse(self, response):
token=response.xpath('//input[@name="csrf_token"]/@value').extract_first()
return FormRequest.from_response(response,formdata={'csrf_token':token,'password':'foo','username':'foo'},callback=self.scape_home_page)
def scape_home_page(self, response):
open_in_browser(response)
This worked well.
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class BucketsSpider(scrapy.Spider):
name = 'buckets'
allowed_domains = ['http://collegekart.in/login']
start_urls = ['http://collegekart.in/login/']
def parse(self, response):
token=response.xpath('//meta[@name="csrf-token"]/@content').extract_first()
print(token)
return FormRequest.from_response(response,formdata={'csrf-token':token,'password':'*********','username':'**************'},callback=self.scape_home_page)
def scape_home_page(self, response):
open_in_browser(response)
print("yes")
This is not working. Please help to solve this.
回答1:
What's wrong
- `........from_response(response........
- if you check the
response.url
, it will give youhttp://collegekart.in/login/
instead ofhttp://collegekart.in/
- if you check the
allowed_domains =
['http://collegekart.in/login']
- the login GET Request of
collegekart.in/
is not in yourallowed_domains
- the login GET Request of
How to fix it
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class BucketsSpider(scrapy.Spider):
name = 'buckets'
allowed_domains = ['collegekart.in']
start_urls = ['http://collegekart.in/login/']
def parse(self, response):
token=response.xpath('//meta[@name="csrf-token"]/@content').extract_first()
print(token)
response = response.replace(url='http://collegekart.in/')
return FormRequest.from_response(response,formdata={'csrf-token':token, 'password':'hanfenghanfeng','username':'zerqqr1@iydhp.com'},callback=self.scape_home_page)
def scape_home_page(self, response):
open_in_browser(response)
print("yes")
Why?
If you didn't replace the
url
variable inresponse
:scrapy will send your request to an incorrect url:
http://collegekart.in/
login/
access/attempt_login?utf8=%E2%9C%93&username=zerqqr1%40iydhp.com&password=hanfenghanfeng
This is the correct url:
http://collegekart.in/access/attempt_login?utf8=%E2%9C%93&username=zerqqr1%40iydhp.com&password=hanfenghanfeng
Login GET url is not included in
allowed_domains
allowed_domains = ['http://collegekart.in/login']
- Login GET url:
http://collegekart.in/access/.......
Suggestions
Use
Chrome's Inspector > Network
to see the actual request being made when performing Login actionsCheck this scrapy official tutorial (PDF Version): Link
回答2:
Here change the response url accordingly, this wil solve the problem.
来源:https://stackoverflow.com/questions/47259090/log-in-not-working-using-scrapy