问题
I'm using scrapy to parse interest rates from Russian Central Bank website
I'm also using Xpath Helper extension in Google Chrome to find a necessary XPath selector. The selector I use in XPath Helper Console below works exactly as I need.
The same query for some reason doesn't work in my spider, even though it navigates to the page.
You can see my Spider code below.
import scrapy
import urllib.parse
class RatesSpider(scrapy.Spider):
name = 'rates'
allowed_domains = ['cbr.ru']
start_urls = ['https://www.cbr.ru/hd_base/zcyc_params/zcyc/?DateTo=01.10.2018']
def parse(self, response):
rates = response.xpath('/html/body/div/div/div/div/div/table/tbody/tr[2]/td').extract()
yield {'Rates': rates
}
The page doesn't seem to be login blocked, because I can parse other elements on the page.
What can I do to make my code work?
回答1:
Table doesn't contain that tbody
node - it's added by browser while rendering page, so just don't use it in XPath (.../table/tbody/tr/...
-> .../table//tr/...
):
rates = response.xpath('/html/body/div/div/div/div/div/table//tr[2]/td').extract()
or simplified
rates = response.xpath('//td').extract()
来源:https://stackoverflow.com/questions/52771158/xpath-selector-works-in-xpath-helper-console-but-doesnt-work-in-scrapy