XPath selector works in XPath Helper console, but doesn't work in scrapy

。_饼干妹妹 提交于 2019-12-11 09:38:41

问题


I'm using scrapy to parse interest rates from Russian Central Bank website

I'm also using Xpath Helper extension in Google Chrome to find a necessary XPath selector. The selector I use in XPath Helper Console below works exactly as I need.

The same query for some reason doesn't work in my spider, even though it navigates to the page.

You can see my Spider code below.

import scrapy
import urllib.parse

class RatesSpider(scrapy.Spider):
   name = 'rates'
   allowed_domains = ['cbr.ru']
   start_urls = ['https://www.cbr.ru/hd_base/zcyc_params/zcyc/?DateTo=01.10.2018']

   def parse(self, response):

    rates = response.xpath('/html/body/div/div/div/div/div/table/tbody/tr[2]/td').extract()

    yield {'Rates': rates
       }

The page doesn't seem to be login blocked, because I can parse other elements on the page.

What can I do to make my code work?


回答1:


Table doesn't contain that tbody node - it's added by browser while rendering page, so just don't use it in XPath (.../table/tbody/tr/... -> .../table//tr/...):

rates = response.xpath('/html/body/div/div/div/div/div/table//tr[2]/td').extract()

or simplified

rates = response.xpath('//td').extract()


来源:https://stackoverflow.com/questions/52771158/xpath-selector-works-in-xpath-helper-console-but-doesnt-work-in-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!