问题

I have written scrapy code for log in to a site. first i tried for one site. It worked well. But then i changed the url and tried for other site. It is not working for that site. I used the same code without any change. What would be the problem?

        # -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser

class QuoteSpider(scrapy.Spider):
    name = 'Quote'
    allowed_domains = ["quotes.toscrape.com"]
    start_urls = (
        'http://quotes.toscrape.com/login',
    )

    def parse(self, response):
        token=response.xpath('//input[@name="csrf_token"]/@value').extract_first()

        return FormRequest.from_response(response,formdata={'csrf_token':token,'password':'foo','username':'foo'},callback=self.scape_home_page)

    def scape_home_page(self, response):
        open_in_browser(response)

This worked well.

    # -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser

class BucketsSpider(scrapy.Spider):
    name = 'buckets'
    allowed_domains = ['http://collegekart.in/login']
    start_urls = ['http://collegekart.in/login/']

    def parse(self, response):
        token=response.xpath('//meta[@name="csrf-token"]/@content').extract_first()
        print(token)
        return FormRequest.from_response(response,formdata={'csrf-token':token,'password':'*********','username':'**************'},callback=self.scape_home_page)

    def scape_home_page(self, response):
        open_in_browser(response)
        print("yes")

This is not working. Please help to solve this.

回答1:

What's wrong

`........from_response(response........
- if you check the response.url, it will give you http://collegekart.in/login/ instead of http://collegekart.in/
allowed_domains =['http://collegekart.in/login']
- the login GET Request of collegekart.in/ is not in your allowed_domains

How to fix it

# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser

class BucketsSpider(scrapy.Spider):
    name = 'buckets'
    allowed_domains = ['collegekart.in']
    start_urls = ['http://collegekart.in/login/']

    def parse(self, response):
        token=response.xpath('//meta[@name="csrf-token"]/@content').extract_first()
        print(token)
        response = response.replace(url='http://collegekart.in/')
        return FormRequest.from_response(response,formdata={'csrf-token':token, 'password':'hanfenghanfeng','username':'zerqqr1@iydhp.com'},callback=self.scape_home_page)

    def scape_home_page(self, response):
        open_in_browser(response)
        print("yes")

Why?

If you didn't replace the url variable in response:
- scrapy will send your request to an incorrect url: http://collegekart.in/login/access/attempt_login?utf8=%E2%9C%93&username=zerqqr1%40iydhp.com&password=hanfenghanfeng
- This is the correct url: http://collegekart.in/access/attempt_login?utf8=%E2%9C%93&username=zerqqr1%40iydhp.com&password=hanfenghanfeng
Login GET url is not included in allowed_domains
- allowed_domains = ['http://collegekart.in/login']
- Login GET url: http://collegekart.in/access/.......

Suggestions

Use Chrome's Inspector > Network to see the actual request being made when performing Login actions
Check this scrapy official tutorial (PDF Version): Link