Scrapy spider not found error

后端未结

关注

 21  2038

感情败类

This is Windows 7 with python 2.7

I have a scrapy project in a directory called caps (this is where scrapy.cfg is)

My spider is located in caps\\caps\\spiders\\c

相关标签:

21条回答

悲&欢浪女

2021-02-02 07:06
I also had this problem,and it turned out to be rather small. Be sure your class inherits from scrapy.Spider
```
my_class(scrapy.Spider):
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
Happy的楠姐

2021-02-02 07:06

without project use runspider and fileName with project use crawl and name sample : C/user> scrapy runspider myFile.py

0 讨论(0)
发布评论:

提交评论
- 加载中...

萌比男神i

2021-02-02 07:07

You have to give a name to your spider.

However, BaseSpider is deprecated, use Spider instead.

from scrapy.spiders import Spider
class campSpider(Spider):
   name = 'campSpider'

The project should have been created by the startproject command:

scrapy startproject project_name

Which gives you the following directory tree:

project_name/
    scrapy.cfg            # deploy configuration file

    project_name/             # project's Python module, you'll import your code from here
        __init__.py

        items.py          # project items file

        pipelines.py      # project pipelines file

        settings.py       # project settings file

        spiders/          # a directory where you'll later put your spiders
            __init__.py
            ...

Make sure that settings.py has the definition of your spider module. eg:

BOT_NAME = 'bot_name' # Usually equals to your project_name 

SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = 'project_name.spiders'

You should have no problems to run your spider locally or on ScrappingHub.

0 讨论(0)

我在风中等你

2021-02-02 07:07

An improper name for the python file could lead to this error (for example crawler.py or scrapy.py).

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-02-02 07:08

Try running scrapy list on the command line. If there is any error on the spider it will detect it.

In my case, I was bluntly copy code from another project and forget to change the project name from the spider module import

0 讨论(0)
发布评论:

提交评论
- 加载中...

情话喂你

2021-02-02 07:08

Name attribute in CrawlSpider class defines the spider name and this name is used in command line for calling the spider to work.

import json

from scrapy import Spider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractor import LinkExtractor

class NameSpider(CrawlSpider):
    name = 'name of spider'
    allowed_domains = ['allowed domains of web portal to be scrapped']
    start_urls = ['start url of of web portal to be scrapped']

    custom_settings = {
        'DOWNLOAD_DELAY': 1,
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
    }

    product_css = ['.main-menu']
    rules = [
        Rule(LinkExtractor(restrict_css=product_css), callback='parse'),
    ]

    def parse(self, response):
        //implementation of business logic

0 讨论(0)

1 2 3 4 下一页