Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdata

我只是一个虾纸丫 提交于 2019-12-04 17:24:40

It is all about the frontend cookie, I will show you how to do it with requests first, the logic will be exactly the same with Scrapy:

head = {        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
    r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
    r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
    soup2 = BeautifulSoup(r.content)
    print(soup2.select_one(".price").text)

You need to make a requests to the url under the option with the id selectCurrency, you then pass the cookies returned when you make your request to https://www.mooda.com/en?currency=sar. There are no posts, it is all get requests but the frontend cookie from the get is essential.

If we run the code, you see it does give us the correct data:

In [9]: with requests.Session() as s:
   ...:         soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
   ...:         r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
   ...:         r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
   ...:         soup2 = BeautifulSoup(r.content,"lxml")
   ...:         print(soup2.select_one(".price").text)
   ...:     

825 SR

using scrapy:

class S(Spider):
    name = "foo"
    allowed_domains = ["www.mooda.com"]
    start_urls = ["https://www.mooda.com/en"]

    def parse(self, resp):
        curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
        return Request(curr, callback=self.parse2)

    def parse2(self, resp):
        print( resp.headers.getlist('Set-Cookie'))
        return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)

    def parse3(self, resp):
        print(resp.css('.price').xpath('text()').extract())

Which if you run will give you:

['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']


[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']

The get to curr returns nothing, it just sets the cookie

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!