Starting Scrapy from a Django view

后端 未结 1 571
面向向阳花
面向向阳花 2021-02-04 12:01

My experience with Scrapy is limited, and each time I use it, it\'s always through the terminal\'s commands. How can I get my form data (a url to be scraped) from my django temp

1条回答
  •  一生所求
    2021-02-04 12:13

    You've actually answered it with an edit. The best option would be to setup scrapyd service and make an API call to schedule.json to trigger a scraping job to run.

    To make that API http call, you can either use urllib2/requests, or use a wrapper around scrapyd API - python-scrapyd-api:

    from scrapyd_api import ScrapydAPI
    
    scrapyd = ScrapydAPI('http://localhost:6800')
    scrapyd.schedule('project_name', 'spider_name')
    

    If we put aside scrapyd and try to run the spider from the view, it will block the request until the twisted reactor would stop - therefore, it is not really an option.

    You can though, start using celery (in tandem with django_celery) - define a task that would run your Scrapy spider and call the task from your django view. This way, you would put the task on the queue and would not have a user waiting for crawling to be finished.


    Also, take a look at the django-dynamic-scraper package:

    Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface.

    0 讨论(0)
提交回复
热议问题