问题
I've been trying to scrapy the following Website but with the currency changed to 'SAR' from the upper left settings form , i tried sending a scrapy request like this:
r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
'value': 'SAR',
'domain': '.www.mooda.com',
'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)
and i still get the price as EG
In [10]: response.css('.price').xpath('text()').extract()
Out[10]:
[u'1,957 EG\xa3',
u'3,736 EG\xa3',
u'2,802 EG\xa3',
u'10,380 EG\xa3',
u'1,823 EG\xa3']
i have also tried to send a post request with the Specified form data like this :
from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)
still it would never work ,also tried to use FormRequest.from_response() but it would never work , id really like some advices ,im new to scrapy form requests , if anyone could help , i'd be thankful
回答1:
It is all about the frontend cookie, I will show you how to do it with requests first, the logic will be exactly the same with Scrapy:
head = { "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
soup2 = BeautifulSoup(r.content)
print(soup2.select_one(".price").text)
You need to make a requests to the url under the option with the id selectCurrency
, you then pass the cookies returned when you make your request to https://www.mooda.com/en?currency=sar
. There are no posts, it is all get requests but the frontend cookie from the get is essential.
If we run the code, you see it does give us the correct data:
In [9]: with requests.Session() as s:
...: soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
...: r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
...: r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
...: soup2 = BeautifulSoup(r.content,"lxml")
...: print(soup2.select_one(".price").text)
...:
825 SR
using scrapy:
class S(Spider):
name = "foo"
allowed_domains = ["www.mooda.com"]
start_urls = ["https://www.mooda.com/en"]
def parse(self, resp):
curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
return Request(curr, callback=self.parse2)
def parse2(self, resp):
print( resp.headers.getlist('Set-Cookie'))
return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)
def parse3(self, resp):
print(resp.css('.price').xpath('text()').extract())
Which if you run will give you:
['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']
[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']
The get to curr returns nothing, it just sets the cookie
来源:https://stackoverflow.com/questions/38066229/scrapy-formrequest-trying-to-send-a-post-request-formrequest-with-currency-c