Scrapy - Scraping links found while scraping

后端 未结 1 363
误落风尘
误落风尘 2021-01-28 03:56

I can only presume this is one of the most basic things to do in Scrapy but I just cannot work out how to do it. Basically, I scrape one page to get a list of urls that contain

相关标签:
1条回答
  • 2021-01-28 04:11

    For every link that you found with parse you can request it and parse the content with the other function:

    class MySpider(scrapy.Spider):
        name = "myspider"
    
        start_urls = [ .....
        ]
    
        def parse(self, response):
            rows = response.css('table.apas_tbl tr').extract()
            urls = []
            for row in rows[1:]:
                soup = BeautifulSoup(row, 'lxml')
                dates = soup.find_all('input')
                url = "http://myurl{}.com/{}".format(dates[0]['value'], dates[1]['value'])
                urls.append(url)
                yield scrapy.Request(url, callback=self.parse_page_contents)
    
        def parse_page_contents(self, response):
            rows = response.xpath('//div[@id="apas_form"]').extract_first()
            soup = BeautifulSoup(rows, 'lxml')
            pages = soup.find(id='apas_form_text')
            for link in pages.find_all('a'):
                url = 'myurl.com/{}'.format(link['href'])
    
            resultTable = soup.find("table", { "class" : "apas_tbl" })
    
    0 讨论(0)
提交回复
热议问题