Scrapy - Extract items from table

后端 未结 3 1018
天命终不由人
天命终不由人 2021-01-02 09:42

Trying to get my head around Scrapy but hitting a few dead ends.

I have a 2 Tables on a page and would like to extract the data from each one then move along to the

相关标签:
3条回答
  • 2021-01-02 10:29

    You need to slightly correct your code. Since you already select all elements within the table you don't need to point again to a table. Thus you can shorten your xpath to something like thistd[1]//text().

    def parse_products(self, response):
        products = response.xpath('//*[@id="Year1"]/table//tr')
        # ignore the table header row
        for product in products[1:]  
           item = Schooldates1Item()
           item['hol'] = product.xpath('td[1]//text()').extract_first()
           item['first'] = product.xpath('td[2]//text()').extract_first()
           item['last'] = product.xpath('td[3]//text()').extract_first()
           yield item
    

    Edited my answer since @stutray provide the link to a site.

    0 讨论(0)
  • 2021-01-02 10:30

    I got it working with these xpaths for the html source you've provided:

    products = sel.xpath('//*[@id="Y1"]/table//tr')
    for p in products[1:]:
        item = dict()
        item['hol'] = p.xpath('td[1]/text()').extract_first()
        item['first'] = p.xpath('td[1]/text()').extract_first()
        item['last'] = p.xpath('td[1]/text()').extract_first()
        yield item
    

    Above assumes that each table row contains 1 item.

    0 讨论(0)
  • 2021-01-02 10:33

    You can use CSS Selectors instead of xPaths, I always find CSS Selectors easy.

    def parse_products(self, response):
    
        for table in response.css("#Y1 table")[1:]:
           item = Schooldates1Item()
           item['hol'] = product.css('td:nth-child(1)::text').extract_first()
           item['first'] = product.css('td:nth-child(2)::text').extract_first()
           item['last'] = product.css('td:nth-child(3)::text').extract_first()
           yield item
    

    Also do not use tbody tag in selectors. Source:

    Firefox, in particular, is known for adding elements to tables. Scrapy, on the other hand, does not modify the original page HTML, so you won’t be able to extract any data if you use in your XPath expressions.

    0 讨论(0)
提交回复
热议问题