Why does my Scrapy code return an empty array?

感情迁移 提交于 2019-12-08 04:50:40

问题


I am building a web scraper for wunderground.com, but I my code returns the value of "[]" for inches_rain and humidity. Could anyone see why this is happening?

# -*- coding: utf-8 -*-
import scrapy
from scrapy.selector import Selector
import time

from wunderground_scraper.items import WundergroundScraperItem


class WundergroundComSpider(scrapy.Spider):
    name = "wunderground"
    allowed_domains = ["www.wunderground.com"]
    start_urls = (
        'http://www.wunderground.com/q/zmw:10001.5.99999',
    )

    def parse(self, response):
        info_set = Selector(response).xpath('//div[@id="current"]')
        list = []
        for i in info_set:
            item = WundergroundScraperItem()
            item['description'] = i.xpath('div/div/div/div/span/text()').extract()
            item['description'] = item['description'][0]
            item['humidity'] = i.xpath('div/table/tbody/tr/td/span/span/text()').extract()
            item['inches_rain'] = i.xpath('div/table/tbody/tr/td/span/span/text()').extract()
            list.append(item)
        return list

I also know that the humidity and inches_rain items are set to the same xpath, but that should be correct because once the information is in an array I just set them to certain values from the array.


回答1:


Let me suggest a more reliable and readable XPath to locate, for the sake of an example, "Humidity" value where the base is that "Humidity" column label:

"".join(i.xpath('.//td[dfn="Humidity"]/following-sibling::td//text()').extract()).strip()

Outputs 45% now.


FYI, your XPath had at least one problem - the tbody tag - remove it from the XPath expression.



来源:https://stackoverflow.com/questions/31191070/why-does-my-scrapy-code-return-an-empty-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!