Start scrapy from Flask route

前端 未结 1 1124
猫巷女王i
猫巷女王i 2021-01-13 06:26

I want to build a crawler which takes the URL of a webpage to be scraped and returns the result back to a webpage. Right now I start scrapy from the terminal and store the r

相关标签:
1条回答
  • 2021-01-13 06:57

    You need to create a CrawlerProcess inside your Flask application and run the crawl programmatically. See the docs.

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    class MySpider(scrapy.Spider):
        # Your spider definition
        ...
    
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
    })
    
    process.crawl(MySpider)
    process.start() # The script will block here until the crawl is finished
    

    Before moving on with your project I advise you to look into a Python task queue (like rq). This will allow you to run Scrapy crawls in the background and your Flask application will not freeze while the scrapes are running.

    0 讨论(0)
提交回复
热议问题