Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdata

妖精的绣舞 提交于 2019-12-06 10:09:04

问题


I've been trying to scrapy the following Website but with the currency changed to 'SAR' from the upper left settings form , i tried sending a scrapy request like this:

r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
                                        'value': 'SAR',
                                        'domain': '.www.mooda.com',
                                        'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)

and i still get the price as EG

In [10]: response.css('.price').xpath('text()').extract()
Out[10]: 
[u'1,957 EG\xa3',
 u'3,736 EG\xa3',
 u'2,802 EG\xa3',
 u'10,380 EG\xa3',
 u'1,823 EG\xa3']

i have also tried to send a post request with the Specified form data like this :

from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)

still it would never work ,also tried to use FormRequest.from_response() but it would never work , id really like some advices ,im new to scrapy form requests , if anyone could help , i'd be thankful


回答1:


It is all about the frontend cookie, I will show you how to do it with requests first, the logic will be exactly the same with Scrapy:

head = {        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
    r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
    r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
    soup2 = BeautifulSoup(r.content)
    print(soup2.select_one(".price").text)

You need to make a requests to the url under the option with the id selectCurrency, you then pass the cookies returned when you make your request to https://www.mooda.com/en?currency=sar. There are no posts, it is all get requests but the frontend cookie from the get is essential.

If we run the code, you see it does give us the correct data:

In [9]: with requests.Session() as s:
   ...:         soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
   ...:         r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
   ...:         r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
   ...:         soup2 = BeautifulSoup(r.content,"lxml")
   ...:         print(soup2.select_one(".price").text)
   ...:     

825 SR

using scrapy:

class S(Spider):
    name = "foo"
    allowed_domains = ["www.mooda.com"]
    start_urls = ["https://www.mooda.com/en"]

    def parse(self, resp):
        curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
        return Request(curr, callback=self.parse2)

    def parse2(self, resp):
        print( resp.headers.getlist('Set-Cookie'))
        return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)

    def parse3(self, resp):
        print(resp.css('.price').xpath('text()').extract())

Which if you run will give you:

['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']


[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']

The get to curr returns nothing, it just sets the cookie



来源:https://stackoverflow.com/questions/38066229/scrapy-formrequest-trying-to-send-a-post-request-formrequest-with-currency-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!