scrapy-shell

Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdata

妖精的绣舞 提交于 2019-12-06 10:09:04
问题 I've been trying to scrapy the following Website but with the currency changed to 'SAR' from the upper left settings form , i tried sending a scrapy request like this: r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency', 'value': 'SAR', 'domain': '.www.mooda.com', 'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True) and i still get the price as EG In [10]: response.css('.price').xpath('text()').extract() Out[10]: [u'1,957

Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdata

我只是一个虾纸丫 提交于 2019-12-04 17:24:40
I've been trying to scrapy the following Website but with the currency changed to 'SAR' from the upper left settings form , i tried sending a scrapy request like this: r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency', 'value': 'SAR', 'domain': '.www.mooda.com', 'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True) and i still get the price as EG In [10]: response.css('.price').xpath('text()').extract() Out[10]: [u'1,957 EG\xa3', u'3,736 EG\xa3', u'2,802 EG\xa3', u'10,380 EG\xa3', u'1,823 EG\xa3'] i have also tried to

Scrapy Shell: twisted.internet.error.ConnectionLost although USER_AGENT is set

假如想象 提交于 2019-12-03 22:44:15
问题 When I try to scrape a certain web site (with both, spider and shell), I get the following error: twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>] I found out that this can happen, when no user agent is set. But after setting it manually, I still got the same error. You can see the whole output of scrapy shell here: http://pastebin.com/ZFJZ2UXe Notes: I am not

Set headers for scrapy shell request

。_饼干妹妹 提交于 2019-12-02 20:51:44
I know that you can scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com' to change the USER_AGENT , but how do you add request headers? there is no current way to add headers directly on cli, but you could do something like: $ scrapy shell ... ... >>> from scrapy import Request >>> req = Request('yoururl.com', headers={"header1":"value1"}) >>> fetch(req) This will update the current shell information with that new request. 来源: https://stackoverflow.com/questions/37010524/set-headers-for-scrapy-shell-request

python convert chinese characters in url

瘦欲@ 提交于 2019-11-29 12:23:10
I have a url like href="../job/jobarea.asp?C_jobtype=經營管理主管&peoplenumber=151" , this is shown in inspect element. But when opened in new tab it is showing as ../job/jobarea.asp?C_jobtype=%B8g%C0%E7%BA%DE%B2z%A5D%BA%DE&peoplenumber=151 How do I know which type of encoding is used by the browser to convert it. When I try to do scrapy it is showing some other format and it is stopping as 500 internal server error. Could you please explain me?? It's Tradtional Chinese, so try cp950 #-*-coding:utf8 -*- import urllib s = '經營管理主管'.decode('utf-8').encode('cp950') print urllib.quote(s) q ='%B8g%C0%E7

python convert chinese characters in url

荒凉一梦 提交于 2019-11-28 05:53:54
问题 I have a url like href="../job/jobarea.asp?C_jobtype=經營管理主管&peoplenumber=151" , this is shown in inspect element. But when opened in new tab it is showing as ../job/jobarea.asp?C_jobtype=%B8g%C0%E7%BA%DE%B2z%A5D%BA%DE&peoplenumber=151 How do I know which type of encoding is used by the browser to convert it. When I try to do scrapy it is showing some other format and it is stopping as 500 internal server error. Could you please explain me?? 回答1: It's Tradtional Chinese, so try cp950 #-*

Scrapy Shell and Scrapy Splash

烂漫一生 提交于 2019-11-28 03:43:31
We've been using scrapy-splash middleware to pass the scraped HTML source through the Splash javascript engine running inside a docker container. If we want to use Splash in the spider, we configure several required project settings and yield a Request specifying specific meta arguments : yield Request(url, self.parse_result, meta={ 'splash': { 'args': { # set rendering arguments here 'html': 1, 'png': 1, # 'url' is prefilled from request url }, # optional parameters 'endpoint': 'render.json', # optional; default is render.json 'splash_url': '<url>', # overrides SPLASH_URL 'slot_policy':