scrapy-shell

Problem with __VIEWSTATE, __EVENTVALIDATION, __EVENTTARGET and scrapy & splash

别说谁变了你拦得住时间么 提交于 2021-01-28 06:04:35
问题 How do i handle __VIEWSTATE, __EVENTVALIDATION, __EVENTTARGET with scrapy/splash? I tried with return FormRequest.from_response(response, [...] '__VIEWSTATE': response.css( 'input#__VIEWSTATE::attr(value)').extract_first(), But this does not work. 回答1: You'll need to use a dict as the formdata keyword arg. (I'd also recommend extracting into variables first for readability) def parse(self, response): vs = response.css('input#__VIEWSTATE::attr(value)').extract_first() ev = # another extraction

How to use python requests with scrapy?

自闭症网瘾萝莉.ら 提交于 2021-01-21 11:55:56
问题 I am trying to use requests to fetch a page then pass the response object to a parser, but I ran into a problem: def start_requests(self): yield self.parse(requests.get(url)) def parse(self, response): #pass builtins.AttributeError: 'generator' object has no attribute 'dont_filter' 回答1: You first need to download the page's resopnse and then convert that string to HtmlResponse object from scrapy.http import HtmlResponse resp = requests.get(url) response = HtmlResponse(url="", body=resp.text,

How to use python requests with scrapy?

為{幸葍}努か 提交于 2021-01-21 11:55:39
问题 I am trying to use requests to fetch a page then pass the response object to a parser, but I ran into a problem: def start_requests(self): yield self.parse(requests.get(url)) def parse(self, response): #pass builtins.AttributeError: 'generator' object has no attribute 'dont_filter' 回答1: You first need to download the page's resopnse and then convert that string to HtmlResponse object from scrapy.http import HtmlResponse resp = requests.get(url) response = HtmlResponse(url="", body=resp.text,

Why this inconsistent behaviour using scrapy shell printing results?

我的梦境 提交于 2020-01-03 01:34:07
问题 Load the scrapy shell scrapy shell "http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/" Try a selector: response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]') Note: it prints results. But now use that selector as a for statement: for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'): row.xpath(".//a[contains(@href, 'report')]/@href").extract_first() Hit return twice, nothing is printed. To print results inside the for loop, you

Selenium with error Traceback (most recent call last): File “<pyshell#3>”, line 1, in <module> fb_login()

北慕城南 提交于 2019-12-24 20:31:00
问题 I have the following code that help me auto fill in my data and login: import webbrowser from selenium import webdriver import time def fb_login(): br=webdriver.Chrome('C:/Python34/Scripts/chromedriver.exe') br.get('https://www.facebook.com/') time.sleep(5) user=br.find_element_by_css_selector('#email') user.send_keys('vivian@hotmail.com') password=br.find_element_by_css_selector('#pass') password.send_keys('9416@io') login=br.find_element_by_css_selector('#u_0_t') login.click() fb_login()

Why am I getting this error in scrapy - python3.7 invalid syntax

前提是你 提交于 2019-12-22 03:54:24
问题 I've had a heck of a time installing scrapy. I have it installed on my mac but I am getting this error when running the tutorial: Virtualenvs/scrapy_env/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154 def write(self, data, async=False): ^ SyntaxError: invalid syntax I'm on the latest versions of everything as far as I can tell. Getting this up and running has been a pain. sheesh. OS High Sierra 10.13.3 python 3.7 installed ipython I've updated about everything I can think of.

Set headers for scrapy shell request

半腔热情 提交于 2019-12-20 09:37:52
问题 I know that you can scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com' to change the USER_AGENT , but how do you add request headers? 回答1: there is no current way to add headers directly on cli, but you could do something like: $ scrapy shell ... ... >>> from scrapy import Request >>> req = Request('yoururl.com', headers={"header1":"value1"}) >>> fetch(req) This will update the current shell information with that new request. 来源: https://stackoverflow.com/questions

Scrapy Shell and Scrapy Splash

拈花ヽ惹草 提交于 2019-12-17 17:31:16
问题 We've been using scrapy-splash middleware to pass the scraped HTML source through the Splash javascript engine running inside a docker container. If we want to use Splash in the spider, we configure several required project settings and yield a Request specifying specific meta arguments: yield Request(url, self.parse_result, meta={ 'splash': { 'args': { # set rendering arguments here 'html': 1, 'png': 1, # 'url' is prefilled from request url }, # optional parameters 'endpoint': 'render.json',

Scrapy shell return without response

試著忘記壹切 提交于 2019-12-07 07:03:32
问题 I have a little problem with scrapy to crawl a website. I followed the tutorial of scrapy to learn how crawl a website and I was interested to test it on the site 'https://www.leboncoin.fr' but the spider doesn't work. So, I tried : scrapy shell 'https://www.leboncoin.fr' But, I haven't a response of the site. $ scrapy shell 'https://www.leboncoin.fr' 2017-05-16 08:31:26 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: all_cote) 2017-05-16 08:31:26 [scrapy.utils.log] INFO: Overridden